Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hello,
I'm encountering an issue with a merge operation in a notebook, where I'm accessing tables from a lakehouse. The merge command fails with a duplicate error. However, when I query the table using SQL Server Management Studio (SSMS) connected to the lakehouse, it shows zero duplicates. I suspected a caching problem and attempted to resolve it by disabling the cache using the following code:
spark.conf.set("spark.synapse.vegas.useCache", "false")
df.cache()
df.unpersist()
I also manually switched environments within the notebook and found no duplicates. The perplexing aspect is that the issue persists when the notebook is triggered via a pipeline, even though there are no duplicates when tested manually. What could be the potential reasons behind this discrepancy, and how can it be addressed?
Hi @Sethulakshmi ,
Thanks for reaching out to us with your problem. The discrepancy between the manual testing and the pipeline-triggered execution could be due to a variety of factors.
To address this issue, you could try the following:
Best Regards
Thanks for the reply,
Regarding the environment consistency, i assume when we trigger a notebook from pipeline there is no option to choose environment, so how can i check the consistency? Please suggest if there is an option.
User | Count |
---|---|
8 | |
1 | |
1 | |
1 | |
1 |
User | Count |
---|---|
11 | |
3 | |
2 | |
2 | |
2 |