Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
hsn367
Frequent Visitor

Machine learning pipelines in Microsoft Fabric

I have a background in building machine learning pipelines in AzureML using AzureML SDK, this really helps us in orchestrating the end to end data science workflows. In workflows in our organization, we have ML pipelines written using AzureML SDK and then we have CI/CD pipelines that are supposed to publish these ML pipelines to AzureML studio.

 

Now moving into Fabric with this background, I have a couple of questions that I did not get answers to when going through the documentations.

 

1) How to orchestrate the data science workflow in Fabric. For instance we have multiple scripts for our end-to-end solution, we can easily build pipelines over it using AzureML SDK in AzureML studio but in fabric what is the alternative, how are we suppose to build ML pipelines?

 

2) Data drift monitoring in an important component of end-to-end data science solution, we can monitor drift of the model's data in AzureML but what is the alternative available in Fabric?

 

2 ACCEPTED SOLUTIONS
v-nikhilan-msft
Community Support
Community Support


@hsn367 wrote:

I have a background in building machine learning pipelines in AzureML using AzureML SDK, this really helps us in orchestrating the end to end data science workflows. In workflows in our organization, we have ML pipelines written using AzureML SDK and then we have CI/CD pipelines that are supposed to publish these ML pipelines to AzureML studio.

 

Now moving into Fabric with this background, I have a couple of questions that I did not get answers to when going through the documentations.

 

1) How to orchestrate the data science workflow in Fabric. For instance we have multiple scripts for our end-to-end solution, we can easily build pipelines over it using AzureML SDK in AzureML studio but in fabric what is the alternative, how are we suppose to build ML pipelines?

 

2) Data drift monitoring in an important component of end-to-end data science solution, we can monitor drift of the model's data in AzureML but what is the alternative available in Fabric?

 


@hsn367 
An additional reply from the internal team for the above questions

We have pipelines in Fabric in the form of Data Factory, and you can run Notebooks with ML activities/code as part of those. Overall, we are working on strengthening our MLOps story. We have Model endpoints in PrPr and working on providing a better SDK. If you look for running MLOps in Production today, we recommend using AzureML with Fabric. AzureML has access to data in OneLake and working on improving that integration. Over time Fabric will become more complete on MLOps too, for data centric and analytics workloads. We focus on scenarios where you serve data to PowerBI today. And we are evolving into other scenarios gradually, like real time model endpoints for example.

We don't yet have drift monitoring in Fabric. On the roadmap.

View solution in original post

hsn367
Frequent Visitor

@v-nikhilan-msftThank you so much for all the support.

View solution in original post

5 REPLIES 5
hsn367
Frequent Visitor

@v-nikhilan-msftThank you so much for all the support.

v-nikhilan-msft
Community Support
Community Support


@hsn367 wrote:

I have a background in building machine learning pipelines in AzureML using AzureML SDK, this really helps us in orchestrating the end to end data science workflows. In workflows in our organization, we have ML pipelines written using AzureML SDK and then we have CI/CD pipelines that are supposed to publish these ML pipelines to AzureML studio.

 

Now moving into Fabric with this background, I have a couple of questions that I did not get answers to when going through the documentations.

 

1) How to orchestrate the data science workflow in Fabric. For instance we have multiple scripts for our end-to-end solution, we can easily build pipelines over it using AzureML SDK in AzureML studio but in fabric what is the alternative, how are we suppose to build ML pipelines?

 

2) Data drift monitoring in an important component of end-to-end data science solution, we can monitor drift of the model's data in AzureML but what is the alternative available in Fabric?

 


@hsn367 
An additional reply from the internal team for the above questions

We have pipelines in Fabric in the form of Data Factory, and you can run Notebooks with ML activities/code as part of those. Overall, we are working on strengthening our MLOps story. We have Model endpoints in PrPr and working on providing a better SDK. If you look for running MLOps in Production today, we recommend using AzureML with Fabric. AzureML has access to data in OneLake and working on improving that integration. Over time Fabric will become more complete on MLOps too, for data centric and analytics workloads. We focus on scenarios where you serve data to PowerBI today. And we are evolving into other scenarios gradually, like real time model endpoints for example.

We don't yet have drift monitoring in Fabric. On the roadmap.

hsn367
Frequent Visitor

Hi @v-nikhilan-msft 

 

Thank you so much for the detailed response. So here is what I got from your response.

1) AzureML pipelines alternative available in Fabric is Fabric Data Factory pipelines where we can orchestrate multiple python scripts of our data science solutions.

 

2) Data drift monitoring is not available in Fabric yet. You mentioned monitoring hub but that does not fulfil the needs of drift monitoring. 

 

I have a couple of more questions regarding migrating to Fabric coming from AzureML background.

 

1) In AzureML, we were heavily relying on AzureML SDK's data asset management for data versioning of our data science solution or to version the data produced by the different components of the pipeline. And it was very easy to just use the latest version of the data or to use any previous version it was just a matter of specifying that version name. So migrating to Fabric how you think we can get the similar behavior there.

 

2) In our current workflows, we have three AML workspaces i.e. a separate one for development, test and prod environments. Now we develop the ML pipelines in dev workspace and then deploy them to test and prod workspaces via CI/CD pipelines. So is it possible to  achieve the same behavior in Fabric?

Hi @hsn367 

Data Versioning in Fabric:
Microsoft Fabric does not have the dataset concept as in Azure Data Factory
While Fabric’s approach to data versioning may differ from AzureML SDK’s data asset management, you can achieve similar behavior by leveraging OneLake and the data integration pipelines within Fabric.
You can use Notebooks and also Azure Devops to acheive version control in Fabric.
https://radacad.com/version-control-in-power-bi-and-fabric
https://www.linkedin.com/pulse/unraveling-past-empowering-future-versioning-timetravel-data/


CI/CD Pipeline Deployment Across Workspaces in Fabric:
Fabric’s lifecycle management tools, including Git integration and deployment pipelines, support a standardized system for collaboration and continuous delivery of updated content into production. Deployment pipelines in Fabric allow you to clone content from one stage to another, typically from development to test, and from test to production, maintaining the connections between copied items. You can have similar 3 workspaces in Fabric and achieve the same.
Introduction to the CI/CD process as part of the ALM cycle in Microsoft Fabric - Microsoft Fabric | ...
The Microsoft Fabric deployment pipelines process - Microsoft Fabric | Microsoft Learn

Hope this helps. Please let me know if you have any further questions.

v-nikhilan-msft
Community Support
Community Support

Hi @hsn367 
Thanks for using Fabric Community.
Transitioning from AzureML to Microsoft Fabric involves adapting to the tools and services that Fabric offers for machine learning and data science workflows. 

Orchestrating Data Science Workflows in Fabric:
1) In Fabric, you can use Fabric notebooks for data science scenarios, which allow you to ingest data into a Fabric lakehouse using Apache Spark, load existing data from delta tables, and clean and transform data using Apache Spark and Python-based tools.
2) You can create experiments and runs to train different machine learning models within these notebooks.
3) For orchestrating workflows, you can construct data analytics workflows with Fabric Data Factory data pipelines, which provide a low-code solution for data integration and ETL projects.
4) The Data Factory in Fabric allows you to build automated workflows that combine different artifacts in your workspace, such as files, notebooks, and dataflows, to create an end-to-end data analytics workflow.
Please refer to these links:
Data science tutorial - get started - Microsoft Fabric | Microsoft Learn
Construct a data analytics workflow with a Fabric Data Factory data pipeline | Microsoft Fabric Blog...

Data Drift Monitoring in Fabric:
1) Fabric doesn't have a built-in data drift monitoring tool like AzureML. However, you can leverage various options for drift detection
2) Monitoring in Fabric is centralized through the Monitoring hub, which enables users to monitor Fabric activities, including data pipelines, dataflows, lakehouses, notebooks, and semantic models.
3) While specific features for data drift monitoring like those in AzureML may not be directly mentioned, the Monitoring hub provides a comprehensive view of all activities and could be used to track changes and performance over time. 
Use the Monitoring hub - Microsoft Fabric | Microsoft Learn

Hope this helps. Please let me know if you have any further questions.

Helpful resources

Announcements
FabricCarousel_June2024

Fabric Monthly Update - June 2024

Check out the June 2024 Fabric update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors