Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hello,
I'm encountering an issue with a merge operation in a notebook, where I'm accessing tables from a lakehouse. The merge command fails with a duplicate error. However, when I query the table using SQL Server Management Studio (SSMS) connected to the lakehouse, it shows zero duplicates. I suspected a caching problem and attempted to resolve it by disabling the cache using the following code:
spark.conf.set("spark.synapse.vegas.useCache", "false")
df.cache()
df.unpersist()
I also manually switched environments within the notebook and found no duplicates. The perplexing aspect is that the issue persists when the notebook is triggered via a pipeline, even though there are no duplicates when tested manually. What could be the potential reasons behind this discrepancy, and how can it be addressed?
Hi @Sethulakshmi ,
Thanks for reaching out to us with your problem. The discrepancy between the manual testing and the pipeline-triggered execution could be due to a variety of factors.
To address this issue, you could try the following:
Best Regards
Thanks for the reply,
Regarding the environment consistency, i assume when we trigger a notebook from pipeline there is no option to choose environment, so how can i check the consistency? Please suggest if there is an option.
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 3 | |
| 3 | |
| 2 | |
| 1 | |
| 1 |
| User | Count |
|---|---|
| 12 | |
| 7 | |
| 4 | |
| 3 | |
| 3 |