Solved: Missing records in Gold when reading Silver via sp...

tyro_ploter

Hi,

I have a daily pipeline in Microsoft Fabric where:

Input notebook reads data from Bronze, transforms it, and writes it as Delta tables into the Silver Lakehouse.

# read
df_table = (spark.read
.option(Constants.WorkspaceId, bronze_workspaceId)
.synapsesql("lake_bronze.table_brz")
)

# transform
.....

#save
df_table .write.mode("overwrite").option("mergeSchema", "true").saveAsTable("lake_silver.schema.table_slv")

Output notebook reads Silver and writes to Gold Warehouse using:

(spark.read.synapsesql("lake_silver.schema.table_slv"))\
.write.mode("overwrite").option(Constants.WorkspaceId, gold_workspaceId).synapsesql("warehouse_gold.dbo.table_gld")

Both notebooks are triggered sequentially from a main notebook.

Problem:
Sometimes records are missing in Gold.

Questions:

Is it expected that spark.read.synapsesql("lake_silver.schema.table_slv") might not see the latest Delta commit in Silver Lakehouse?
What is the recommended way to ensure Output always reads the fully committed Silver data before writing to Gold?
Are there best practices for Lakehouse → Warehouse pipelines in Fabric to avoid missing records?

Any guidance or examples would be appreciated.

tayloramy

Hi @tyro_ploter,

Personally I just add the other lakehouse from the other workspace to the notebook. I've never had an issue with it.

If you must use the SQL endpoint, use the API to force a refresh first, then wait for 5 minutes.

Refresh SQL analytics endpoint Metadata REST API (Preview) | Microsoft Fabric Blog | Microsoft Fabri...

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

View solution in original post

tayloramy

Hi @tyro_ploter,

@Gpop13 is correct, Read the data from the lakehouse using Spark (spark.table) and then you can write data to the warehouse using synapsesql. It's the reading operation that's causing you greif right now.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

View solution in original post

tyro_ploter

Hi @tayloramy , @Gpop13 ,

I’m using synapsesql because my architecture relies on separate workspaces for Bronze, Silver, and Gold layers, all managed through Git CI/CD. This approach makes it easier to handle cross-workspace reads and writes without relying on shortcuts or workspace-specific artifacts, which are not version-controlled and could cause inconsistencies during branch merges.
Another reason is that the Gold layer is a Warehouse, and Fabric notebooks can only attach Lakehouses, not Warehouses. Therefore, synapsesql is the only way to write data from Spark into the Gold Warehouse.

That said, I might be missing a better approach. If there’s a recommended pattern for cross-workspace Lakehouse → Warehouse notebook that avoids snapshot latency and still works well with CI/CD, I’d love to hear your suggestions.

tayloramy

Hi @tyro_ploter,

Personally I just add the other lakehouse from the other workspace to the notebook. I've never had an issue with it.

If you must use the SQL endpoint, use the API to force a refresh first, then wait for 5 minutes.

Refresh SQL analytics endpoint Metadata REST API (Preview) | Microsoft Fabric Blog | Microsoft Fabri...

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

tyro_ploter

Hi @tayloramy
Thank you for the suggestion!
In my case, the main reason is that Gold is a Warehouse, and Fabric notebooks can only attach Lakehouses, not Warehouses. I need to write a DataFrame into a Warehouse in another workspace, and as far as I know, synapsesql is the supported way to do this.

I’ll consider your idea about adding a refresh and a timeout before reading.

tayloramy

Hi @tyro_ploter,

@Gpop13 is correct, Read the data from the lakehouse using Spark (spark.table) and then you can write data to the warehouse using synapsesql. It's the reading operation that's causing you greif right now.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

Gpop13

Hi @tyro_ploter , the suggestion is only to see if synapsesql can be avoided while reading it from the lakehouse. I believe you can still continue to use it to write it to the warehouse. Because synapsesql reads it from the sql endpoint, you may be facing this issue.