Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hi all,
Currently, our org is really struggling with Gen2 dataflows writing to a Warehouse. The speed at which they execute (sometimes 8 hours and then a timeout), success rates, general stalling to name a few issues.
Does anybody here have any tips on how to make these work more efficiently? For reference we are outputting maybe 7 tables with approx 2 millions rows each (filtered from an original approx 7 million for the sake of making the flows execute). We are fetching 15 or so tables into the flow to be used as intermediate queries that are joined onto the output queries within the Gen2.
We are also not using a gateway. We couldn't get that to work at all within these flows so we used a self-hosted integration runtime in Azure ADF to feed into a staging Azure SQL database which we are then connecting to within the Gen2 flows.
I've tried a few of the commonly suggested fixes such as disabling / enabling staging, writing to Lakehouse first (though this doesn't really solve the problem of the flows themselves), etc.
Any success stories here? Would love to here some tricks and tips - or even just some reassurance this issue is as widespread as it seems to be 🙂
Also any tips on optimizing queries within the Warehouse itself? Sometimes this just seems to completely freeze and meltdown the whole system. Sitting on a 32 SKU at the moment
Thanks a lot!
Hi @DataPne
Thanks for using Microsoft Fabric Community.
Apologies for the issue that you are facing.
This might require a deeper investigation from our engineering team about your workspace, the pipeline details and the logic behind it to properly understand what might be happening.
Please go ahead and raise a support ticket to reach our support team:
https://support.fabric.microsoft.com/support
Please provide the ticket number here as we can keep an eye on it.
Thank you.
Hi @DataPne
Apologise for the inconvenience that you are facing here.
Just as a work around, instead of loading on prem data to Dataflow Gen2 and then to warehouse (On-Prem -> DFg2 -> Warehouse) can you please try to test by loading the data from on-prem to Dataflow Gen2 and then to Lakehouse.(On-Prem -> DFg2 -> Lakehouse) (files or tables, would test both)
Then create a Data Pipeline to move data from Lakehouse to Warehouse.
Just like the medallion architecture we see often, we want to do a quick load of the source into a Raw layer, then work our way to the Warehouse which would be Silver/Gold.
The issue which you are facing while loading the data into warehouse is just temporary. This won't be a thing once Data Pipelines can use On Prem Data Gateway which is in development.
Hope this helps. Please let me know if you have any further questions.
Hi @DataPne
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.
Thanks
Hi @DataPne
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread .
Thanks
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
Check out the November 2024 Fabric update to learn about new features.
User | Count |
---|---|
5 | |
5 | |
5 | |
4 | |
3 |