Solved: Re: DataFlow Gen2 runs successfully but data is mi...

ilseeb · ‎12-02-2024

I currently have a pipeline in which I copy data from two different sources and create lakehouse tables. Then I added a waiting period of 160seconds to be sure to leave enough time for the lakehouse to refresh, and then run a DataFlow Gen2 in which I append both tables, do some transformations and add the final table to the DataWarehouse (using Overwrite).

Recently, I have noticed that sometimes in the final table of the data warehouse all data from one of the two sources is missing, or new rows of data are missing. Everytime I have noticed this happenning, the DataFlow ran successfully and I also verify that the data is available in the lakehouse tables.

Is there something else I could monitor to make sure that my data warehouse table gets updated successfully?

Has anyone else noticed this? Could it be a bug?

FabianSchut · ‎12-02-2024

Hi ilseeb, I have noticed that sometimes 160 seconds will not be enough to update the SQL-endpoint of a lakehouse. I found a blog posts that contains a Python script to refresh the SQL-endpoint programmatically and wait for the lakehouse to be refreshed before moving on. You can find the blog post here: https://www.obvience.com/blog/fix-sql-analytics-endpoint-sync-issues-in-microsoft-fabric-data-not-sh.... You can add this in a notebook and run that after copying the data and before appending and transforming.

View solution in original post

DiegoPLAB · ‎12-12-2025

Hello, I had the same issue some days ago.
I have a Dataflow Gen2 that reads many tables from a Lakehouse, makes some transformation and finally write modified data in a Warehouse overwriting the destination tables.
Although execution logs indicate that data was read and written without errors, tables in the Warehouse doesn't contain any new or updated data, as if no overwrite happened. Except for the CALENDAR table, that is generated inside the Dataflow Gen2 using only M code (no source from the Lakehouse).
In the Lakehouse, new and updated records are available. Checking Dataflow steps today, I can see those records.

I would know if there is a official planned solution for 2026 to this problem.

ilseeb

Hi, are you sure your data in the Lakehouse is being updated before it goes to the warehouse?
In my case, I would check later once I realized data was missing in the warehouse, and data was available in the lakehouse, but what was really happenning is that the lakehosue endpoint was not being refreshed before the data was written in the warehouse. I added a step to refresh the SQL lakehouse endpoint before the warehouse step and the problem was solved.

DiegoPLAB

Hi, the data in the Lakehouse was updated beacuse we implemented a pipeline for a Medallion Architecture: the Lakehouse is the SILVER layer and the Warehouse is the GOLD layer.
The pipeline has been in production for over six months and so far there have been no issues refreshing the Lakehouse endpoint.
If necessary, we can consider the possibility to set a wait task or to force the refresh of the endpoint. But since the problem has not occurred previously, it can be a Fabric issue.

FabianSchut · ‎12-02-2024

Hi ilseeb, I have noticed that sometimes 160 seconds will not be enough to update the SQL-endpoint of a lakehouse. I found a blog posts that contains a Python script to refresh the SQL-endpoint programmatically and wait for the lakehouse to be refreshed before moving on. You can find the blog post here: https://www.obvience.com/blog/fix-sql-analytics-endpoint-sync-issues-in-microsoft-fabric-data-not-sh.... You can add this in a notebook and run that after copying the data and before appending and transforming.

Element115 · ‎04-18-2025

@ilseeb @FabianSchut @Anonymous

Unfortunately, this is a gamble. Being an undocumented and unsupported solution by Microsoft, as soon as one morning some Microsoft tech bro parachutes into his cubicle all fresh and perky, full of unwarranted optimism for the day ahead fueled by too much caffeine, and on a whim decides to change some code in the API, this 'so-called' solution will come crashing down while you run it in production.

Who wants to gamble with production workloads? Not me.

But since Microsoft has been working on this and they say they will release something in Q2, like a REST API endpoint to query in order to refresh the lakehouse metadata (I guess they are just officializing this undocumented solution), why not send them a message, loud and clear, by voting for the idea found at the link below and that will give us a new pipeline activity that's easy to set up from within a pipeline, set and forget, and will do the job without having to mess around with a Python Notebook? (I have in mind here those users that expect user-friendliness, and not those who are coding veterans.)

Here is the idea link: LAKEHOUSE I/O WRITE DELAY MITIGATION - Microsoft Fabric Community