Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Sethulakshmi
Frequent Visitor

Cache memory causing duplicates in Fabric notebook

Hello,

I'm encountering an issue with a merge operation in a notebook, where I'm accessing tables from a lakehouse. The merge command fails with a duplicate error. However, when I query the table using SQL Server Management Studio (SSMS) connected to the lakehouse, it shows zero duplicates. I suspected a caching problem and attempted to resolve it by disabling the cache using the following code:
spark.conf.set("spark.synapse.vegas.useCache", "false")

df.cache()

df.unpersist()

I also manually switched environments within the notebook and found no duplicates. The perplexing aspect is that the issue persists when the notebook is triggered via a pipeline, even though there are no duplicates when tested manually. What could be the potential reasons behind this discrepancy, and how can it be addressed?

2 REPLIES 2
Anonymous
Not applicable

Hi @Sethulakshmi ,

Thanks for reaching out to us with your problem. The discrepancy between the manual testing and the pipeline-triggered execution could be due to a variety of factors.

  • Environment Differences: There might be differences between the environment in which you’re manually testing the notebook and the environment in which the pipeline runs. These differences could be in terms of software versions, configurations, or data states.
  • Data Timing Issues: If your data is being updated frequently, it’s possible that duplicates are introduced between the time you manually check for duplicates and the time the pipeline runs.
  • Caching Mechanism: As you suspected, the issue might be related to caching. However, it’s important to note that the caching mechanism is orchestrated and upheld by the Microsoft Fabric itself, and it doesn’t offer users the capability to manually clear the cache. Caching in Fabric data warehousing - Microsoft Fabric | Microsoft Learn

To address this issue, you could try the following:

  • Debugging: Add logging statements in your notebook to capture the state of your data at various points in your pipeline. This could help you identify where and when the duplicates are introduced.
  • Data Snapshot: Create a snapshot of your data before running the merge operation. This could help you identify if the duplicates are present in the data at the time of the merge operation.
  • Environment Consistency: Ensure that the environment in which you’re manually testing the notebook is identical to the environment in which the pipeline runs.

 

Best Regards

Thanks for the reply,

Regarding the environment consistency, i assume when we trigger a notebook from pipeline there is no option to choose environment, so how can i check the consistency? Please suggest if there is an option.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.