Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
Sethulakshmi
Frequent Visitor

Cache memory causing duplicates in Fabric notebook

Hello,

I'm encountering an issue with a merge operation in a notebook, where I'm accessing tables from a lakehouse. The merge command fails with a duplicate error. However, when I query the table using SQL Server Management Studio (SSMS) connected to the lakehouse, it shows zero duplicates. I suspected a caching problem and attempted to resolve it by disabling the cache using the following code:
spark.conf.set("spark.synapse.vegas.useCache", "false")

df.cache()

df.unpersist()

I also manually switched environments within the notebook and found no duplicates. The perplexing aspect is that the issue persists when the notebook is triggered via a pipeline, even though there are no duplicates when tested manually. What could be the potential reasons behind this discrepancy, and how can it be addressed?

2 REPLIES 2
Anonymous
Not applicable

Hi @Sethulakshmi ,

Thanks for reaching out to us with your problem. The discrepancy between the manual testing and the pipeline-triggered execution could be due to a variety of factors.

  • Environment Differences: There might be differences between the environment in which you’re manually testing the notebook and the environment in which the pipeline runs. These differences could be in terms of software versions, configurations, or data states.
  • Data Timing Issues: If your data is being updated frequently, it’s possible that duplicates are introduced between the time you manually check for duplicates and the time the pipeline runs.
  • Caching Mechanism: As you suspected, the issue might be related to caching. However, it’s important to note that the caching mechanism is orchestrated and upheld by the Microsoft Fabric itself, and it doesn’t offer users the capability to manually clear the cache. Caching in Fabric data warehousing - Microsoft Fabric | Microsoft Learn

To address this issue, you could try the following:

  • Debugging: Add logging statements in your notebook to capture the state of your data at various points in your pipeline. This could help you identify where and when the duplicates are introduced.
  • Data Snapshot: Create a snapshot of your data before running the merge operation. This could help you identify if the duplicates are present in the data at the time of the merge operation.
  • Environment Consistency: Ensure that the environment in which you’re manually testing the notebook is identical to the environment in which the pipeline runs.

 

Best Regards

Thanks for the reply,

Regarding the environment consistency, i assume when we trigger a notebook from pipeline there is no option to choose environment, so how can i check the consistency? Please suggest if there is an option.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.