How to handle read/write operations on Lakehouse i...

Michal_Izak · ‎12-09-2024

We use a medallion architecture based on Lakehouses (LH) and Gen2 dataflows to process and prepare data for productive use in a Power BI report.

Data is processed sequentially through multiple dataflows and intermediate results are stored in staging LH, bronze LH, etc.

The execution of the dataflows is orchestrated by a pipeline with a generic structure like this:

dataflow1 ==> LH1 == dataflow2 ==> LH2

We have noticed that sometimes dataflow2 does not use the data written by dataflow1 in the n-th pass of the pipeline. Instead, the data from the (n-1) run of the pipeline was read by dataflow2.

To investigate this behavior, I created a simplified test pipeline:

"Init Write" writes a table with 2 million rows into LH and WH.

After a delay of 2 minutes, two dataflows "LH Rewrite" and "WH Rewrite" are executed 10 times in a row.

Each dataflow reads the table from LH/WH finds the highest Index and adds another row (current date/time and Index+1) to it and writes the whole table back to LH/WH ("replace" is used instead of "append").

After 10 iterations, the resulting tables are as follows.
Lakehouse:

Warehouse:

Apparently "LH Rewrite" sometimes gets outdated data from the LH and the read/write process does not seem to be managed properly. After 10 consecutive executions, the Index column in LH is corrupted.

In the WH, each read operation seems to wait until the write operation is finished and I get the 10 updates of the Index column.

I observed that this LH behavior is independent of enabled/disabled staging in the "LH Rewrite" dataflow.

How should we set up the dataflow or Lakehouse to ensure proper subsequent write and read behavior of Lakehouses?

------------

Footnotes:

I am aware of the delay in the SQL Endpoint data visibility (Solved: SQL Endpoint Slow To Reflect Changes In Lakehouse - Microsoft Fabric Community) but there are no reports on delays at the Lakehouse itself. To access the LH data, I do not use SQL Endpoint, but access LH directly via Lakehouse.Contents([]).
The problem described here (https://community.fabric.microsoft.com/t5/Dataflow/Dataflow-Gen2-not-updating-Lakehouse/m-p/4302835) may have the same root cause.

Anonymous · ‎12-10-2024

Hi @Michal_Izak ,

Some steps that can be taken for the problem you describe:

Ensure that each read operation waits for the previous write operation to fully complete to prevent reading obsolete data.
If applicable, consider using “append” instead of “replace” to avoid overwriting data and causing inconsistencies.
Regularly monitor the execution of the data flow and check for any errors or delays that may affect data consistency.
Double-check the configuration of the data flow and Lakehouse to ensure that they are set up correctly and that there are no problems with the data source or target.

Best Regards,
Adamk Kong

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Michal_Izak · ‎12-13-2024

Hi Adamk Kong,

thank you for your feedback on my problem.
Regarding your suggestions:

I have always assumed that as soon as a dataflow has completed (re)writing a table in the LH, this table is already visible with its new content for subsequent processes. Is that not the case? By what means can I check (inside a pipeline or inside a dataflow) before reading a table from LH that no write operation or write postprocessing is running on it?
I would like to use “append” instead of “replace”, but this is not possible in my use case. There are many use cases where "append" is not an option: updating Slowly Changing Dimension tables, adding new columns to a table, etc...
The execution of the dataflow does not report any error, delay or similar. Everything is marked as successfully completed. That is exactly the problem! What else should I check specifically?
We first observed this LH behavior in our production environment. Then I double-checked it with the simplified test case. Anyone can easily reproduce this with a simple test described in my first post. I can provide the dataflow and pipeline templates if needed.

How to handle read/write operations on Lakehouse in Dataflow Gen2?

Helpful resources

Fabric Community Update - July 2025

Fabric Monthly Update - June 2025

Party with Power BI’s own Guy in a Cube

How to handle read/write operations on Lakehouse in Dataflow Gen2?

Helpful resources

Fabric Community Update - July 2025

Fabric Monthly Update - June 2025