March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
Evertime I create the first Dataflow Gen2 inside a Workspace, I also see a SQL endpoint and Dataset named 'DataflowsStagingLakehouse' being created. If I delete them, my Dataflows start to fail?
There is no mention of these that I could find in the documentation so I was wondering what their purpose is?
Jurgen
Solved! Go to Solution.
Hello @jurgenp ,
July 11th update: We recently posted a new blog detailing Dataflow Gen2 architecture. which covers how these artifacts are used by Dataflows Gen2
Dataflows Gen2, like Dataflows Gen1, leverage staging storage to store the data produced by refreshing queries that are load enabled (the default option). This storage location for Dataflows Gen2 is the "DataflowStagingLakehouse" artifact you mentioned.
Dataflows Gen2, like Dataflows Gen1, also leverage an enhanced compute engine to improve performance of data reads and transformations. This is the DataflowStagingWarehouse. It is leveraged in a few scenarios:
These artifacts should not be removed. They are essential to dataflow gen2’s operation.
A few comments:
I want to again re-iterate the importance of not deleting these items. They are required for dataflows to operate and should be viewed as system artifacts. We are planning to make changes in the future to both improve the customer experience and prevent their deletion.
I am planning to add documentation with more details about how dataflows Gen2 work, their use of Staging Storage and Compute, and best practices to get the most out of Dataflows Gen2 architecture.
Thank you,
Ben
Hello @jurgenp ,
July 11th update: We recently posted a new blog detailing Dataflow Gen2 architecture. which covers how these artifacts are used by Dataflows Gen2
Dataflows Gen2, like Dataflows Gen1, leverage staging storage to store the data produced by refreshing queries that are load enabled (the default option). This storage location for Dataflows Gen2 is the "DataflowStagingLakehouse" artifact you mentioned.
Dataflows Gen2, like Dataflows Gen1, also leverage an enhanced compute engine to improve performance of data reads and transformations. This is the DataflowStagingWarehouse. It is leveraged in a few scenarios:
These artifacts should not be removed. They are essential to dataflow gen2’s operation.
A few comments:
I want to again re-iterate the importance of not deleting these items. They are required for dataflows to operate and should be viewed as system artifacts. We are planning to make changes in the future to both improve the customer experience and prevent their deletion.
I am planning to add documentation with more details about how dataflows Gen2 work, their use of Staging Storage and Compute, and best practices to get the most out of Dataflows Gen2 architecture.
Thank you,
Ben
Hi Ben,
Thanks for sending through your thorough response. I had two questions
Question 1:
In the first comment you say that regardless of the number of dataflows created there will only be one warehouse or lakehouse. I seem to be getting multiple? Am I interpreting this incorrectly?
Question 2 (sorry there is actually multiple questions in this one?):
One of my dataflows is just doing some pretty basic transformations from excel files and probably does not need to stage the information in a lakehouse / warehouse. From this I get the following questions:
Thanks in advance for the response. Cheers.
question1: what will always be one is "DataflowStagingLakehouse" and "DataflowStagingWarehouse". there will be at maximum one per type per workspace and each dataflow gen 2 will stage data in them (if required). lineage view sometimes duplicates "boxes" even if they refer to the same object (it's been a (bug IMHO there for ages).
question2:
I personally understood much more about this strange architecture when I first connected to SQL endpoint of a lakehouse of warehouse via SSMS. You then realize that is unique by workspace and each lakehouse, warehouse, kql database, and staging databases are installed there (even if invisible) in the UI.
Leaving aside the question of deleting these objects (which makes sense that this would be a bad idea), would you also recommend that we not store data there (e.g. as an output from a gen2 dataflow)?
Yes, while its not prevented today, these artifacts should not be used outside the dataflow experience.
They seem to be internal artifacts taht are used as part fo teh process of creating gen2 daaflows and populating your lakehouse.
The general assumption i have heard is that when it goes into production these will be hidden.
Most important thing is to not chagne or delete anything in tehre as it does nasty things- in short pretend that they do not exist.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
3 | |
2 | |
2 | |
1 | |
1 |