Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
tyro_ploter
Frequent Visitor

Missing records in Gold when reading Silver via spark.read.synapsesql

Hi,

I have a daily pipeline in Microsoft Fabric where:

  • Input notebook reads data from Bronze, transforms it, and writes it as Delta tables into the Silver Lakehouse.
# read
df_table = (spark.read
.option(Constants.WorkspaceId, bronze_workspaceId)
.synapsesql("lake_bronze.table_brz")
)

# transform
.....

#save
df_table .write.mode("overwrite").option("mergeSchema", "true").saveAsTable("lake_silver.schema.table_slv")
  • Output notebook reads Silver and writes to Gold Warehouse using:
(spark.read.synapsesql("lake_silver.schema.table_slv"))\
.write.mode("overwrite").option(Constants.WorkspaceId, gold_workspaceId).synapsesql("warehouse_gold.dbo.table_gld")

Both notebooks are triggered sequentially from a main notebook.

Problem:
Sometimes records are missing in Gold. 

Questions:

  • Is it expected that spark.read.synapsesql("lake_silver.schema.table_slv") might not see the latest Delta commit in Silver Lakehouse?
  • What is the recommended way to ensure Output always reads the fully committed Silver data before writing to Gold?
  • Are there best practices for Lakehouse → Warehouse pipelines in Fabric to avoid missing records?

Any guidance or examples would be appreciated.

2 ACCEPTED SOLUTIONS

Hi @tyro_ploter

 

Personally I just add the other lakehouse from the other workspace to the notebook. I've never had an issue with it.

If you must use the SQL endpoint, use the API to force a refresh first, then wait for 5 minutes. 

Refresh SQL analytics endpoint Metadata REST API (Preview) | Microsoft Fabric Blog | Microsoft Fabri...

 

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

View solution in original post

Hi @tyro_ploter

 

@Gpop13 is correct, Read the data from the lakehouse using Spark (spark.table) and then you can write data to the warehouse using synapsesql. It's the reading operation that's causing you greif right now. 

 

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution. 

View solution in original post

7 REPLIES 7
tyro_ploter
Frequent Visitor

Hi @tayloramy , @Gpop13 ,

 

I’m using synapsesql because my architecture relies on separate workspaces for Bronze, Silver, and Gold layers, all managed through Git CI/CD. This approach makes it easier to handle cross-workspace reads and writes without relying on shortcuts or workspace-specific artifacts, which are not version-controlled and could cause inconsistencies during branch merges.
Another reason is that the Gold layer is a Warehouse, and Fabric notebooks can only attach Lakehouses, not Warehouses. Therefore, synapsesql is the only way to write data from Spark into the Gold Warehouse.


That said, I might be missing a better approach. If there’s a recommended pattern for cross-workspace Lakehouse → Warehouse notebook that avoids snapshot latency and still works well with CI/CD, I’d love to hear your suggestions.

Hi @tyro_ploter

 

Personally I just add the other lakehouse from the other workspace to the notebook. I've never had an issue with it.

If you must use the SQL endpoint, use the API to force a refresh first, then wait for 5 minutes. 

Refresh SQL analytics endpoint Metadata REST API (Preview) | Microsoft Fabric Blog | Microsoft Fabri...

 

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

Hi @tayloramy 
Thank you for the suggestion!
In my case, the main reason is that Gold is a Warehouse, and Fabric notebooks can only attach Lakehouses, not Warehouses. I need to write a DataFrame into a Warehouse in another workspace, and as far as I know, synapsesql is the supported way to do this.

I’ll consider your idea about adding a refresh and a timeout before reading.

Hi @tyro_ploter

 

@Gpop13 is correct, Read the data from the lakehouse using Spark (spark.table) and then you can write data to the warehouse using synapsesql. It's the reading operation that's causing you greif right now. 

 

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution. 

Hi @tyro_ploter , the suggestion is only to see if synapsesql can be avoided while reading it from the lakehouse. I believe you can still continue to use it to write it to the warehouse. Because synapsesql reads it from the sql endpoint, you may be facing this issue.

tayloramy
Community Champion
Community Champion

Hi @tyro_ploter

 

@Gpop13 is asking the right questions. 

 

spark.read.synapsesql(...) will use the SQL Endpoint, which has some delays before getting updated data. 

 

I recommend reading directly from the delta table using spark.table(...). 

 

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution. 

Gpop13
Advocate II
Advocate II

Hi @tyro_ploter ,

What is the reason behind using synapsesql to read, I presume silver is a lakehouse?

can we not use spark.read.table("lake_silver.schema.table_slv") and then write it to the Gold warehouse?

 

Helpful resources

Announcements
November Fabric Update Carousel

Fabric Monthly Update - November 2025

Check out the November 2025 Fabric update to learn about new features.

Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors