Solved: DataFlow Gen2 runs successfully but data is missin...

ilseeb · ‎12-02-2024

I currently have a pipeline in which I copy data from two different sources and create lakehouse tables. Then I added a waiting period of 160seconds to be sure to leave enough time for the lakehouse to refresh, and then run a DataFlow Gen2 in which I append both tables, do some transformations and add the final table to the DataWarehouse (using Overwrite).

Recently, I have noticed that sometimes in the final table of the data warehouse all data from one of the two sources is missing, or new rows of data are missing. Everytime I have noticed this happenning, the DataFlow ran successfully and I also verify that the data is available in the lakehouse tables.

Is there something else I could monitor to make sure that my data warehouse table gets updated successfully?

Has anyone else noticed this? Could it be a bug?

FabianSchut · ‎12-02-2024

Hi ilseeb, I have noticed that sometimes 160 seconds will not be enough to update the SQL-endpoint of a lakehouse. I found a blog posts that contains a Python script to refresh the SQL-endpoint programmatically and wait for the lakehouse to be refreshed before moving on. You can find the blog post here: https://www.obvience.com/blog/fix-sql-analytics-endpoint-sync-issues-in-microsoft-fabric-data-not-sh.... You can add this in a notebook and run that after copying the data and before appending and transforming.

View solution in original post

FabianSchut · ‎12-02-2024

Hi ilseeb, I have noticed that sometimes 160 seconds will not be enough to update the SQL-endpoint of a lakehouse. I found a blog posts that contains a Python script to refresh the SQL-endpoint programmatically and wait for the lakehouse to be refreshed before moving on. You can find the blog post here: https://www.obvience.com/blog/fix-sql-analytics-endpoint-sync-issues-in-microsoft-fabric-data-not-sh.... You can add this in a notebook and run that after copying the data and before appending and transforming.

Element115 · ‎04-18-2025

@ilseeb @FabianSchut @Anonymous

Unfortunately, this is a gamble. Being an undocumented and unsupported solution by Microsoft, as soon as one morning some Microsoft tech bro parachutes into his cubicle all fresh and perky, full of unwarranted optimism for the day ahead fueled by too much caffeine, and on a whim decides to change some code in the API, this 'so-called' solution will come crashing down while you run it in production.

Who wants to gamble with production workloads? Not me.

But since Microsoft has been working on this and they say they will release something in Q2, like a REST API endpoint to query in order to refresh the lakehouse metadata (I guess they are just officializing this undocumented solution), why not send them a message, loud and clear, by voting for the idea found at the link below and that will give us a new pipeline activity that's easy to set up from within a pipeline, set and forget, and will do the job without having to mess around with a Python Notebook? (I have in mind here those users that expect user-friendliness, and not those who are coding veterans.)

Here is the idea link: LAKEHOUSE I/O WRITE DELAY MITIGATION - Microsoft Fabric Community