Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
Jester_3
Frequent Visitor

Microsoft Fabric - Data Pipeline Syncing Issue

Hi,

 

Me and my colleague have noticed an issue with a few pipelines recently.

 

For context the pipelines architecture is made up of 4 dataflows in a linear flow, the first two extract data from a postgresql database and send it to a bronze lakehouse.

 

Dataflow 3 and dataflow 4 move the data from bronze to silver and silver to gold respectively.

 

What we are seeing is that the initial extracts run fine i.e. dataflows 1 and 2, but when the 3rd one runs its like its viewing a cached version of the lakehouse which does not contain the newly populated data.

 

We have tried setting waits inbetween the dataflows which has not worked.

 

We have also tried removing dataflows 3 and 4 from the pipeline and scheduling them to run directly in their settings which also did not work.

 

Is there a current work around for this?

1 ACCEPTED SOLUTION
deborshi_nag
Memorable Member
Memorable Member

Hi @Jester_3 

 

It is a known behavior that can occur in Fabric when the Lakehouse/OneLake metadata (Delta log & SQL analytics endpoint) hasn’t caught up with the latest write yet. In short: the data is written promptly, but the readers in subsequent steps sometimes hit stale metadata for a few minutes.
 
I think dataflow 3 is using the SQL analytics endpoint internally, which is why it is experiencing this. 
 
It is best you introduce an extra step to confirm changes have been written i.e. introduce a “commit confirmation” notebook step between Dataflow 2 and 3
 

Insert a lightweight Notebook that:

  • Reads the Delta log/version of your Silver target tables and waits until a new version appears; or
  • Performs a harmless SELECT COUNT(*) loop until the expected watermark changes.
    This aligns with reports that the Delta log/metadata propagation is the bottleneck; explicit polling avoids running Dataflow 3 while the table is still on the prior version.
Hope this helps, please appreciate by leaving a Kudos or accepting as a Solution!
 
 
I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

4 REPLIES 4
Yousaf-rao1
New Member

Kudos done

chiragdbbbb
New Member

What you’re hitting is the lag between physical writes and metadata visibility in Fabric’s SQL analytics endpoint. Dataflow 3 is likely reading stale metadata even though the Bronze‑to‑Silver write has completed.
Your commit‑check workaround is solid. A few refinements worth noting for others:
- Poll Delta version rather than just the date, so multiple commits in a day don’t collide.
- Tune timeouts — most syncs finish in minutes, so shorter waits with retries can reduce pipeline latency.
In short, explicit synchronization is the safest way to ensure downstream dataflows operate on the latest state until Fabric improves metadata propagation.

deborshi_nag
Memorable Member
Memorable Member

Hi @Jester_3 

 

It is a known behavior that can occur in Fabric when the Lakehouse/OneLake metadata (Delta log & SQL analytics endpoint) hasn’t caught up with the latest write yet. In short: the data is written promptly, but the readers in subsequent steps sometimes hit stale metadata for a few minutes.
 
I think dataflow 3 is using the SQL analytics endpoint internally, which is why it is experiencing this. 
 
It is best you introduce an extra step to confirm changes have been written i.e. introduce a “commit confirmation” notebook step between Dataflow 2 and 3
 

Insert a lightweight Notebook that:

  • Reads the Delta log/version of your Silver target tables and waits until a new version appears; or
  • Performs a harmless SELECT COUNT(*) loop until the expected watermark changes.
    This aligns with reports that the Delta log/metadata propagation is the bottleneck; explicit polling avoids running Dataflow 3 while the table is still on the prior version.
Hope this helps, please appreciate by leaving a Kudos or accepting as a Solution!
 
 
I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

Hi @deborshi_nag 

 

Thanks for this suggested fix, I think I'll have to apply this to all my pipelines currently as this issue is happening more requently this past week.

 

The long description of the fix may be beneficial for others so i've went into detail below.

 

I ended up adding an until loop with a timeout of an hour in my pipeline:

Jester_3_0-1767889255093.png

The conditional check on the pipeline was : 

 

@equals(
activity('DELTA_CHECK').output.result.exitValue,
formatDateTime(utcNow(), 'yyyy-MM-dd')
)

 

The notebook content : 

 

from delta.tables import DeltaTable
from datetime import datetime

table_path = "Tables/ABC"  

delta_table = DeltaTable.forPath(spark, table_path)

history_df = delta_table.history()  

timestamp = str(history_df.select("Timestamp").head()[0])

dt = datetime.fromisoformat(timestamp)

date_only = dt.date()

mssparkutils.notebook.exit(str(date_only))

Helpful resources

Announcements
Sticker Challenge 2026 Carousel

Join our Community Sticker Challenge 2026

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Free Fabric Certifications

Free Fabric Certifications

Get Fabric certified for free! Don't miss your chance.

January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.