Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I am trying to run the merge in the notebook. I am using optimize and merge low shuffle enabled as true in the cell. The cell executes successfully sometimes but sometimes it throws error. When I play around with disabling the optimize or shuffle randomly like true, false or false, false or false, true , the cell executes again. Please can you guide me what might be the reason behind this issue.
The code in the cell of the notebook is as below:
Solved! Go to Solution.
Hi @akakkar ,
Thanks for reaching out to the Microsoft fabric community forum.
Thanks for sharing the details and error trace. Based on your description and the stack trace (Py4JJavaError during .merge().execute()), the issue seems to be caused by one or more of the following factors, especially in combination with the Delta Lake optimization settings you're using.
Data Skew or Partitioning Issues
Using spark.microsoft.delta.merge.lowShuffle.enabled=true can lead to shuffle or skew-related failures if the data is not evenly distributed (e.g., if AccountId has many repeated values).
Incompatible Schema or Column Mapping
The set_dict used for updating may include columns with mismatched data types or missing in either the source or target table, which can cause merge execution to fail.
Delta Table State or Concurrency Conflicts
If another process is writing to the Delta table or if there's a stale transaction log, the merge might fail intermittently.
Resource Constraints or Executor Failures
Optimizations like optimizeWrite and lowShuffle can increase memory or shuffle pressure, causing sporadic failures under certain cluster loads.
Disable Optimizations for Isolation Testing
Try disabling the optimizations to confirm stability:
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "false")
spark.conf.set("spark.microsoft.delta.merge.lowShuffle.enabled", "false")
Validate Column Mappings
Ensure all columns in set_dict exist in both DataFrames and their data types match.
print(df_fullaccount.dtypes)
print(target_table.toDF().dtypes)
Repartition Input Data
Improve distribution to prevent skew:
df_fullaccount = df_fullaccount.repartition("AccountId")
Add Full Error Logging
To see the actual Java exception behind Py4JJavaError, use:
import traceback
traceback.print_exc()
Check Spark UI Logs
If the above doesn’t isolate the issue, please review the executor logs in the Spark UI for detailed exception traces.
Low Shuffle Merge optimization on Delta tables - Azure Synapse Analytics | Microsoft Learn
Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn
Tutorial: Delta Lake - Azure Databricks | Microsoft Learn
If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.
We appreciate your engagement and thank you for being an active part of the community.
Best regards,
LakshmiNarayana
Hi @akakkar ,
If your issue has been resolved, please consider marking the most helpful reply as the accepted solution. This helps other community members who may encounter the same issue to find answers more efficiently.
If you're still facing challenges, feel free to let us know—we’ll be glad to assist you further.
Looking forward to your response.
Best regards,
LakshmiNarayana.
Hi @akakkar ,
If your question has been answered, kindly mark the appropriate response as the Accepted Solution. This small step goes a long way in helping others with similar issues.
We appreciate your collaboration and support!
Best regards,
LakshmiNarayana
Hi @akakkar ,
As we haven't heard back from you, we are closing this thread. If you are still experiencing the same issue, we kindly request you to create a new thread we’ll be happy to assist you further.
Thank you for your patience and support.
If our response was helpful, please mark it as Accepted as Solution and consider giving a Kudos. Feel free to reach out if you need any further assistance.
Best Regards,
Lakshmi Narayana
Hi @akakkar ,
Thanks for reaching out to the Microsoft fabric community forum.
Thanks for sharing the details and error trace. Based on your description and the stack trace (Py4JJavaError during .merge().execute()), the issue seems to be caused by one or more of the following factors, especially in combination with the Delta Lake optimization settings you're using.
Data Skew or Partitioning Issues
Using spark.microsoft.delta.merge.lowShuffle.enabled=true can lead to shuffle or skew-related failures if the data is not evenly distributed (e.g., if AccountId has many repeated values).
Incompatible Schema or Column Mapping
The set_dict used for updating may include columns with mismatched data types or missing in either the source or target table, which can cause merge execution to fail.
Delta Table State or Concurrency Conflicts
If another process is writing to the Delta table or if there's a stale transaction log, the merge might fail intermittently.
Resource Constraints or Executor Failures
Optimizations like optimizeWrite and lowShuffle can increase memory or shuffle pressure, causing sporadic failures under certain cluster loads.
Disable Optimizations for Isolation Testing
Try disabling the optimizations to confirm stability:
spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "false")
spark.conf.set("spark.microsoft.delta.merge.lowShuffle.enabled", "false")
Validate Column Mappings
Ensure all columns in set_dict exist in both DataFrames and their data types match.
print(df_fullaccount.dtypes)
print(target_table.toDF().dtypes)
Repartition Input Data
Improve distribution to prevent skew:
df_fullaccount = df_fullaccount.repartition("AccountId")
Add Full Error Logging
To see the actual Java exception behind Py4JJavaError, use:
import traceback
traceback.print_exc()
Check Spark UI Logs
If the above doesn’t isolate the issue, please review the executor logs in the Spark UI for detailed exception traces.
Low Shuffle Merge optimization on Delta tables - Azure Synapse Analytics | Microsoft Learn
Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn
Tutorial: Delta Lake - Azure Databricks | Microsoft Learn
If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.
We appreciate your engagement and thank you for being an active part of the community.
Best regards,
LakshmiNarayana
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
13 | |
4 | |
3 | |
3 | |
3 |
User | Count |
---|---|
8 | |
7 | |
6 | |
6 | |
5 |