Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. It offers a comprehensive suite of services including Data Engineering, Data Factory, Data Science, Real-Time Analytics, Data Warehouse, and Databases.
As a Power BI user, you now have access to a full suite of enhanced capabilities through its integration into a comprehensive SaaS platform, giving you even more tools to explore and utilize. In this blog article, we will focus on Fabric Data Pipelines.
Fabric data pipelines are designed to simplify and automate the process of moving, transforming, and loading data into your desired destination. They provide a streamlined approach to handling data workflows, allowing you to focus on what you do best—analyzing data and driving insights.
In this blog post, we'll explore how you can start using Fabric Data Pipelines as a Power BI user that wants to take full advantage of Microsoft Fabric. And if you have never used Power BI before but still want to start with pipelines, you are in the right place!
Some use cases that may be familiar to you and this blog series will try to address are:
- You have heard of OneLake and want to understand how you can use it to build your BI reports
- You want to move and transform data from your company’s legacy systems to a modern analytics platform where Power BI can access them easily
- You have heard about some activities in data pipelines like semantic model refresh or notifications in Teams/Outlook that would be useful in your project
If any of these seem interesting to you, let’s get hands-on!
Let’s start with the basics:
➡️What kind of license do I need to start experimenting with Fabric Data Pipelines?
There are different options available as of December 2024:
- You have access to a Power BI premium capacity and your organization has allowed users to create Fabric items: You can easily start creating Fabric items like data pipelines.
- You have access to a Fabric capacity
- You can start a Fabric trial as described here
Either of these options will work so let’s go directly to powerbi.com!
➡️Which are the important Fabric terms I should know before jumping into Fabric Pipelines?
Fabric is a unified analytics platform and given the different experiences it offers from Data Engineering to Real Time Analytics it’s hard to master every component.
In this section, we want to highlight some important items you should be aware of before starting your pipelines development in the context of this blog series:
- OneLake: A single, unified, logical data lake for your whole organization. A data Lake processes large volumes of data from various sources. Like OneDrive, OneLake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data.
- Lakehouse: Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. It's a flexible and scalable solution that allows organizations to handle large volumes of data using various tools and frameworks to process and analyze that data. Delta Lake is chosen as the unified table format. All Fabric experiences generate and consume Delta Lake tables, driving interoperability and a unified product experience. Delta Lake tables produced by one compute engine, such as Synapse Data warehouse or Synapse Spark, can be consumed by any other engine, such as Power BI. When you ingest data into Fabric, Fabric stores it as Delta tables by default.
- Warehouse: A lake-centric data warehouse built on an enterprise grade distributed processing engine that enables industry leading performance at scale while eliminating the need for configuration and management.
In the following steps we will use a lakehouse as the destination of our data pipeline. We will create the lakehouse as part of the pipeline building process so there is no need to create it now. As a Power BI user, you are already familiar with the Power BI interface. Now with Fabric, if you go to powerbi.com you can easily switch between experiences from the bottom left and navigate to Data Factory:
➡️How can I create my first pipeline?
For the following steps, you can either create items in ‘My workspace’ or create a new workspace to host your new Fabric items. If you create a new workspace, make sure to check the advanced settings while creating it and choose the right capacity.
Workspaces continue to be the collaborative environment where you can manage reports, semantic models etc. but now can also include fabric items like dataflows gen2, data pipelines and more.
We will start by creating a pipeline item called SamplePipeline:
You can see 2 options, either to start with a blank canvas or with guidance. Given we are new to data pipelines as Power BI users, we will start with guidance and more specifically the Copy Data Assistant.
As mentioned before, we can use data pipelines to ingest data at scale and schedule data workflows. Copy Data Assistant helps us in this process through a step-by-step experience of selecting our source and destination. In later blog articles, we will explore more advanced options!
Let’s see now how Copy Data Assistant guides us through the process of creating our first pipeline:
- We start by selecting our data source which in our case is the NYC Taxi – Green sample data:
- We see a preview of the data and click Next:
- We land the data in a new lakehouse that we create on the spot:
- We leave the default options in the next page – A new table will be created with the following column mappings:
- We click Save and Run:
- We are redirected to the pipeline canvas where the copy activity is automatically inserted by the copy assistant to complete the data movement - The pipeline will be successful after some minutes:
As you can see, the output pane includes information about the pipeline run which you can also export. You can also click on the Activity name and get more information about the pipeline run:
You can now go to the BronzeLakehouse and verify that the table is created – just go the workspace on the left and choose the lakehouse:
Congratulations, you just created your first pipeline that lands data in a lakehouse.
From this point, you can transform the data with spark notebooks, add more data in your lakehouse and many more. And the most important aspect of a Power BI user starting to use Fabric: You can now use Direct Lake!
Stay tuned for the next parts where we explore more advanced features of Fabric Data Pipelines.