Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hey Guys, I'm new to Fabric, I'm trying to wrap my head around which services I should be using and in what order.
I'm trying to reverse engineer our current ETL process for our existing data warehouse on SQL server database.
Our ETL is incremental so every day we receive revenue files for yesterday, we also pull data for legacy databases etc...
I designed a quick diagram to illustrate how I'm thinking of designing it after I read Fabric documentation.
Here are my questions based on the diagram below:
1. What service should I use to push my files from Data Lake Storage Gen2 (DLS) to Lakehouse?
2. Does Dataflow Gen2 do the same thing as Notebook if not, can you please provide an example?
3. How does the Dataflow Gen2 knows that there is a new file in the Lakehouse and it needs to process it?
4. is the Data Pipeline purpose in this diagram just to push the data to the Warehouse?
5. The second image represents the services on Fabric, is it possible to tell me in which order I should start using the services, could you please number each service in the right order of use?
Thank you
Solved! Go to Solution.
Hi @JohnAG
Could you please give more examples or documentation on a solution that is currently doing that using Fabric? We are looking for incremental ingestion into the Warehouse
This is an example using Dataflows Gen2 to define an incremental approach
Link1
This is clear now, I will do more research and start investing more time into Python.
Is there a good learning place related to Fabric to start immediately with Python in Lakehouse, I know the basics of Python?
Python used in Lakehouse is nothing but Pyspark. You can refer to this link for more information: Link2
You can also refer to these videos:Videos
Hope this helps. Please let me know if you have any other questions.
Hi @JohnAG
Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this.
We will update you once we hear back from them.
Thanks
Hi @JohnAG
1. What service should I use to push my files from Data Lake Storage Gen2 (DLS) to Lakehouse?
It would depend on how the storage is configured. There are options like you could simply create shortcut from fabric lakehouse to the files in ADLS, you could use ADF in Azure to copy the files over from ADLS to Lakehouse (files or tables); it really depends because there are other options you could explore as well, but you don't necessarily need to move the existing files, they can work as-is.
2. Does Dataflow Gen2 do the same thing as Notebook if not, can you please provide an example?
Notebooks run off Spark compute, allowing for large scale data transformations using code-first methods. Dataflow Gen2 is our low-code option allowing similar transformations but may not scale nor have as much custom control or extensibility as you can achieve with a code-first solution. Dataflows Gen2 is basically the next gen of Dataflows which is our Power Query Online functionality, so not the same as Notebook. It is our no-code/low-code ability to connect, transform, and now land data into a destination. Notebooks are all code, but also provide some data wrangling capabilities similar to Power Query that will also generate code to use.
3. How does the Dataflow Gen2 knows that there is a new file in the Lakehouse and it needs to process it?
There currently is no trigger functionality like we have in ADF/Synapse Pipelines. Currently Storage Event Triggers are on the Roadmap. Once that is available then if they were using Dataflow Gen2 you could reference that in a Data Pipeline and schedule as needed. For now you would need to code around that or simply use ADF to push the files to Fabric Lakehouse or load into Fabric Lakehouse tables.
4. Is the Data Pipeline purpose in this diagram just to push the data to the Warehouse?
Fabric Data Pipelines are much more than a Copying data tool! A lot of control flow and orchestration capabilities allow you to use Data Pipelines to control the execution of Dataflows Gen2, Notebooks, and many others.
5. The second image represents the services on Fabric, is it possible to tell me in which order I should start using the services, could you please number each service in the right order of use?
There is no particular order that these would necessarily be used. It would depend on the task you are looking to perform and which of the items within the different experiences that the user or organization is more comfortable with using. They wouldn't necessarily need to use all of them. We do have some decision guides that you can take a look at
Link1
Link2
There is a nice online area that you could use to do the diagrams as well with an example of a Fabric Lakehouse end-to-end design here in the Examples page Azure Diagrams
Hope this helps to answer all your queries. Please let me know if you have any further questions.
Hi, sorry to piggy back off this, but in the Examples page diagram you linked, it mentioned a tutorial that worked an example end to end, is this by any chance available ?
Thanks
Thank you so much for your helpful explanation.
Please see my answers and questions below
Hi @JohnAG
Could you please give more examples or documentation on a solution that is currently doing that using Fabric? We are looking for incremental ingestion into the Warehouse
This is an example using Dataflows Gen2 to define an incremental approach
Link1
This is clear now, I will do more research and start investing more time into Python.
Is there a good learning place related to Fabric to start immediately with Python in Lakehouse, I know the basics of Python?
Python used in Lakehouse is nothing but Pyspark. You can refer to this link for more information: Link2
You can also refer to these videos:Videos
Hope this helps. Please let me know if you have any other questions.
Hi @JohnAG
Glad that your query got resolved. Please continue using Fabric Community for any help regarding your queries.
User | Count |
---|---|
8 | |
2 | |
1 | |
1 | |
1 |
User | Count |
---|---|
11 | |
4 | |
2 | |
1 | |
1 |