Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
Michal_Izak
Frequent Visitor

How to handle read/write operations on Lakehouse in Dataflow Gen2?

We use a medallion architecture based on Lakehouses (LH) and Gen2 dataflows to process and prepare data for productive use in a Power BI report.

Data is processed sequentially through multiple dataflows and intermediate results are stored in staging LH, bronze LH, etc.

The execution of the dataflows is orchestrated by a pipeline with a generic structure like this:  

dataflow1 ==> LH1 == dataflow2 ==> LH2


We have noticed that sometimes dataflow2 does not use the data written by dataflow1 in the n-th pass of the pipeline. Instead, the data from the (n-1) run of the pipeline was read by dataflow2.

To investigate this behavior, I created a simplified test pipeline:

Michal_Izak_0-1734088148827.png

 

"Init Write" writes a table with 2 million rows into LH and WH.

After a delay of 2 minutes, two dataflows "LH Rewrite" and "WH Rewrite" are executed 10 times in a row.

Each dataflow reads the table from LH/WH finds the highest Index and adds another row (current date/time and Index+1) to it and writes the whole table back to LH/WH ("replace" is used instead of "append").

After 10 iterations, the resulting tables are as follows.
Lakehouse:

Michal_Izak_1-1733763151900.png

Michal_Izak_3-1733763235144.png


Warehouse:

Michal_Izak_2-1733763170624.png

Michal_Izak_4-1733763253148.png

 

Apparently "LH Rewrite" sometimes gets outdated data from the LH and the read/write process does not seem to be managed properly. After 10 consecutive executions, the Index column in LH is corrupted.

In the WH, each read operation seems to wait until the write operation is finished and I get the 10 updates of the Index column.

 

I observed that this LH behavior is independent of enabled/disabled staging in the "LH Rewrite" dataflow.

 

How should we set up the dataflow or Lakehouse to ensure proper subsequent write and read behavior of Lakehouses?

------------

Footnotes:

2 REPLIES 2
Anonymous
Not applicable

Hi @Michal_Izak ,

 

Some steps that can be taken for the problem you describe:

  • Ensure that each read operation waits for the previous write operation to fully complete to prevent reading obsolete data.
  • If applicable, consider using “append” instead of “replace” to avoid overwriting data and causing inconsistencies.
  • Regularly monitor the execution of the data flow and check for any errors or delays that may affect data consistency.
  • Double-check the configuration of the data flow and Lakehouse to ensure that they are set up correctly and that there are no problems with the data source or target.

 

Best Regards,
Adamk Kong

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Adamk Kong,

thank you for your feedback on my problem.
Regarding your suggestions:

  • I have always assumed that as soon as a dataflow has completed (re)writing a table in the LH, this table is already visible with its new content for subsequent processes. Is that not the case? By what means can I check (inside a pipeline or inside a dataflow) before reading a table from LH that no write operation or write postprocessing is running on it?
  • I would like to use “append” instead of “replace”, but this is not possible in my use case. There are many use cases where "append" is not an option: updating Slowly Changing Dimension tables, adding new columns to a table, etc...
  • The execution of the dataflow does not report any error, delay or similar. Everything is marked as successfully completed. That is exactly the problem! What else should I check specifically?
  • We first observed this LH behavior in our production environment. Then I double-checked it with the simplified test case. Anyone can easily reproduce this with a simple test described in my first post. I can provide the dataflow and pipeline templates if needed.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.