Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi, I'm currently in testing fabric with a trial account and I have created a working data ingestion with a pipeline from a on premise source using Dataflow Gen2 to Lakehouse. Most fact tables I have managed to update incrementally by using a field from the source which can be used to filter the source data and append it to the tables. Initially I had some datatype issues with reports created on a semantic model on the lakehouse and was advised to move the data from the lakehouse to a warehouse and create a semantic model in the warehouse and create reports from the warhouse.
Until here everything works fine.
But some of my dimension tables are have data that is changing often and there is no way to know which records are new or changed so I do a full refresh of these tables every time I reload the data which is multiple times a day. In the pipeline I'm replacing the table in the lakehouse every run but when I copied the data from the lakehouse to the warehouse it was appending it instead of replacing. I couldn't find an option to replace instead of append (was I not looking enough?). To resolve this I first run a stored procedure in the pipeline to delete all records from the dimension table in the warehouse before appending the data as part of the refresh.
The problem is that that takes a couple of minutes and during that time the tables are temporary empty and due to the direct lake users will experience empty reports (because the used dimension tables are empty) every time we are refreshing. I guess this is now how it's supposed to be setup.
What should I do different? What's wrong in my approach? please advice.
Solved! Go to Solution.
Hi @SnoekL ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered. Otherwise, will respond back with the more details and we will try to help .
Hey, Data pipelines do not support transaction and load data in batches.
So best way would be to replace the data as is in staging tables (you can use pre copy script in copy activity to delete initially) and then via Stoared procedure activity ; delete the destination table and load from staging table all within a transaction :
https://learn.microsoft.com/en-us/fabric/data-warehouse/transactions
So at no point in time there would be zero data issue (Assuming you chcek whether there are no zero records in staging table)
Hi @SnoekL ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered. Otherwise, will respond back with the more details and we will try to help .
Hi @SnoekL ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered. Otherwise, will respond back with the more details and we will try to help .