March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I haven't worked with DF Gen2 at all. Hence, the folowing questions.
If I don't want to publish my DF table to a Data destination, without Enable Staging, my DF table (authored with PQ) is not visible to power bi's dataflow connector (dependency DF's internal storage much like GEN1).
However, if I want to publish my DF to a Data detination (Lakehouse) should I Enable Staging or not.
Without Enabling Staging, will the data still publish to a destination (e.g. Lakehouse)? If it gets published, will it still update in the detination upon DF refresh with disable staging?
With Enable Staging, does the data published in Lakehose have any chance of duplication?
I am seeking best practice advise when publish to a destination what should I do for staging that does not rseult in data duplication in destination?
Also, for Update = Append , should I enable/disable staging (does enable/disable staging matter at all)?
Also, upon publishing the table to LH, I see this weird named table got published as well to LH. Why does it happen and what should I do with it? It is a replica of same dim_propchange without he headers. dim_propchange is a table authored in DFGen2
, sources coming from SP and published to LH with Update=Replcae and Staging Disabled.
Hi, @smpa01
When publishing your Dataflow (DF) table to a Data destination like Lakehouse, it is generally recommended to disable staging to improve performance. When staging is enabled, ingestion will take more time. By default, staging is disabled when loading data into the Lakehouse or other non-warehouse destinations. This means that the data is directly written to the data destination without using staging.
Data Factory Spotlight: Dataflow Gen2 | Microsoft Fabric Blog | Microsoft Fabric
Best Regards,
Community Support Team _Charlotte
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
There is some information in this comment which is new to me:
"However, the data might not be as clean or organized as it would be with staging enabled."
"However, there might be a risk of inconsistencies or incomplete data updates since staging helps in managing incremental changes and ensuring data integrity."
"Enabling staging helps in managing and organizing the data before it is published to the destination. This process reduces the risk of data duplication and ensures that the data is clean and consistent."
"In summary, enabling staging is generally recommended to ensure data integrity, avoid duplication, and manage incremental updates effectively."
Where did you find this information?
This information is surprising to me and I would like to get more information about this. Can you please explain more about why there is a risk of inconsistencies or incomplete data when the enable staging option is disabled?
Disable staging is, after all, the default setting in Dataflows Gen2 when loading to Lakehouse: https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-data-destinations-and-managed-se...
I thought the primary purpose of enable/disable staging was related to performance optimization of the Dataflow Gen2.
Thanks for this @frithjof_v . yes, the default is Disable staging
Dataflow Gen2 is not even comparable to notebook in terms of performance. I am not surprised by that and I don't expect PQ to be faster than notebook's distributed processing.
However, there are situations when I don't have an option other than to rely on DFGen2 (e.g. sharepoint). In future, if I can procure accees to Graph API, I can discard this option.
@v-zhangtin-msft @miguel can you please validate the comments from @frithjof_v
Two things, I care about most if I have to rely of DF Gen2, in terms of priority
a. What do I need to do to ensure data is not duplicated + Incremental Refresh (willing to overlook performance) in destination (e.g. lakehouse)? (Staging or Disable Staging)
b. if a is satisifed, what are the possible performance tuning available?
I haven't studied so much about the topic of Enable / Disable staging.
I hope this helps: https://learn.microsoft.com/en-us/fabric/data-factory/dataflow-gen2-data-destinations-and-managed-se...
I don't think staging/no staging should have any impact on duplication of data in the destination.
Regarding that strange table name, I have not seen or heard about that before.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
3 | |
2 | |
2 | |
1 | |
1 |