Solved: Delay between lakehouse ingestion and the data sho...

smeetsh · ‎05-13-2025

Hi All,

A known issue is the fact that if I ingest data into a lakehouse table. I can take sometime for that data to show up in the actual SQL endpoint of that lakehouse table. (known issue 1092). Until now i have been trying to fix this issue by adding a 5 or 10 minute delay in fromt of the step that queries the lakehouse sql endpoint (we use this a lot to join raw data with out dim tables, do ETL etc and get the data ready for the bussines analysts. The works with varying succes and causes unneccesary delays, pipelines that should take only a minute now take 10 minutes or more, and sometime it still hasn't caugt up.

I found the below article and i am wondering if anyone has already tried it?
Known issue - Delayed data availability in SQL analytics endpoint when using a pipeline

Do I understand correctly the fix is as simple as adding a script activity that has a script like:

SELECT TOP(1) 1

FROM [lakehouse].[dbo].[tablename]

Or is this not a fix for the delay between a fabric lakehouse and its analytics endpoint?

Cheers

Hans

v-lgarikapat · ‎05-16-2025

Hi @smeetsh ,
Thank you for bringing this up.

This is a known issue, and our product team is currently working on a fix on the backend. We understand that the delays can be frustrating, especially when they impact overall performance.

The issue also originates from Microsoft’s end, and their team is actively working on a resolution. We appreciate your patience and understanding in the meantime.

If you found this post helpful, please 'consider giving it Kudos' and marking it as the 'accepted solution' to assist other members in finding it more easily.

Thank you.
Best Regards,

LakshmiNarayana.

View solution in original post

v-lgarikapat · ‎05-16-2025

Hi @smeetsh ,
Thank you for bringing this up.

This is a known issue, and our product team is currently working on a fix on the backend. We understand that the delays can be frustrating, especially when they impact overall performance.

The issue also originates from Microsoft’s end, and their team is actively working on a resolution. We appreciate your patience and understanding in the meantime.

If you found this post helpful, please 'consider giving it Kudos' and marking it as the 'accepted solution' to assist other members in finding it more easily.

Thank you.
Best Regards,

LakshmiNarayana.

v-lgarikapat · ‎06-01-2025

Hi @smeetsh ,

As we haven't heard back from you, we are closing this thread. If you are still experiencing the same issue, we kindly request you to create a new thread we’ll be happy to assist you further.

Thank you for your patience and support.
Best Regards,
Lakshmi Narayana

v-lgarikapat · ‎05-14-2025

Hi @smeetsh ,
Thank you for reaching out to the Microsoft Community Forum
SELECT TOP(1) 1 FROM [lakehouse].[dbo].[tablename]
does not fix the delay. While it confirms that the table exists and is accessible, it doesn’t guarantee that the latest ingested data is visible yet. This is why sometimes your pipeline still fails or returns incomplete data even after a delay.
Loop with Condition Check
Instead of using a fixed delay, implement a loop in your pipeline that runs a query like:
SELECT COUNT(1)
FROM [lakehouse].[dbo].[tablename]
WHERE [LoadTimestamp] >= @ExpectedTimestamp
This checks if the new data is visible and only then proceeds. This avoids unnecessary delay and ensures consistency.
Use Spark or Notebooks for Direct Access
If you're working with notebooks or Spark pipelines, read the table directly using Delta format:
df = spark.read.format("delta").load("Tables/your_table_path")
This gives you immediate access to the latest data, bypassing the SQL delay entirely.
Use REFRESH TABLE (Optional)
In Spark notebooks, running:
REFRESH TABLE your_table_name
can sometimes force metadata sync -but its effectiveness may vary.

If this solution helped resolve your query, kindly mark it as Solution Accepted and consider giving a Kudos so it can assist others in the community facing similar issues.

Let me know if you need further assistance!

Best Regards,
Lakshmi Narayana

smeetsh · ‎05-14-2025

Edit : I don't think the if condition will work, since i would have to use a sql endpoint to be able to query the lakehouse, which is exactly where the problem lies. My raw data comes from an API and does not include the date/time that i made the api call. I can read a commit from a parquet file with a notebook, but there is no similar fucntionality to read it from the sql endpoint, so there is nothing to compare.

Thank you for this, I want to try the if condition, but I am unclear how it is supposed to work?

I dont have a column [LoadTimestamp] and @expectedtimestamp looks like a variable to me?

please someone explain in more detail

I am still puzzled though why Microsoft published the script as a work around to force a refresh. Should I maybe do just select 1 and not select top(1) 1?

I can't use a notebook, since the data needs to be writen to a warehouse table for out analysts to do their work. A notebook can read and write to a lakehouse , but can only read from a warehouse.

The basic architecture of lakehouse to warehouse is something I cannot change.

Delay between lakehouse ingestion and the data showing up in the sql-endpoint of that lakehouse

Helpful resources

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025

Become a Certified Power BI Data Analyst!

Delay between lakehouse ingestion and the data showing up in the sql-endpoint of that lakehouse

Helpful resources

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025