The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Are there any risks with asynchronous Delta merge/concurrent writes from Bronze (not Lakehouse) to Silver (Lakehouse) for tables that are not dependent on each other?
Suppose I have multiple tables, each being merged independently (e.g., {T1: Merge_1, T2: Merge_2, ..., Tn: Merge_n} - same notebook), and there is no overlap or dependency between these tables.
- Are there still any risks or potential issues with running these merges concurrently?
- Has anyone experienced problems with this pattern?
- Are there any resource-level concerns (e.g., cluster contention, throttling, or IO bottlenecks) I should be aware of?
Solved! Go to Solution.
Hi @smpa01 ,
Thank you for reaching out to the Microsoft Fabric Community Forum.
When merging multiple independent tables asynchronously from Bronze (non-Lakehouse) to Silver (Lakehouse Delta), and assuming no cross-table dependencies, the pattern is generally safe but there are still some risks and practical considerations to be consider.
Even if the tables are fully independent, running multiple merge operations at the same time can cause some problems. Each merge uses system resources like CPU, memory, and input/output (IO), so if you run many merges together, especially on a smaller cluster, it can slow down performance or even cause jobs to fail.
Also, if all the merge operations are reading from or writing to the same storage system as ADLS or One Lake, it can create IO bottlenecks or hit bandwidth limits, particularly when dealing with large volumes of data. Merge operations also involve internal processing (called shuffling), which uses a lot of memory. If too many such processes run at once, it can lead to memory issues or spill data to disk, which slows things down.
Lastly, when running merges asynchronously (in parallel), it becomes harder to catch errors. If one merge fails, you may not notice it unless you specifically check or handle it in your code.
So, while this pattern is supported and commonly used, it’s important to control the number of concurrent merges, monitor your cluster performance, and handle errors properly to avoid performance or reliability issues.
Hope this helps. Please reach out for further assistance.
Thank you.
Hi @smpa01 ,
We haven’t received an update from you in some time. Could you please let us know if the issue has been resolved?
If you still require support, please let us know, we are happy to assist you. Also thankyou @BhaveshPatel for your helpful response.
Thank you.
Hi @smpa01
There is no harm in following data lakehouse pattern from Bronze to Silver to Gold. Just follow best practices:
1. Single Delta table in one notebook at a time. Overwrite the independent table. (
2. Utilize Materilised Lake View in Silver
3. Use raw data in bronze ( Extract) (Python) , transform it in silver ( Transform) ( Spark SQL ) and clean up in Gold ( Load ) ( Spark SQL). CPU and IO part is considered when you are dealing with a billion of rows of data. Small tables ( ~ 5000000 rows ) is not a problem in Data Lakehouse.
4. Consider F64 capacity cluster which is equivalent to Power BI Premium P1 Node.
4. Follow Dataflow Gen 2, It has Data Source and Data Destination for Self Service ETL ( UI ). See Screenshots below:
5. Follow Notebooks but you are advanced user of delta lake and you know exactly what you are doing. Save money a lot..but it has a risk of deleting delta table as well. ( Parquet / Delta & Apache Spark Engine )
Hi @smpa01 ,
Could you please confirm if the issue has been resolved? If a solution has been found, it would be greatly appreciated if you could share your insights with the community. This would be helpful for other members who may encounter similar issues.
Thank you.
Hi @smpa01 ,
Thank you for confirming. Please share your details once it's complete, and we will be happy to assist with any further questions.
Thank you.
Hi @smpa01 ,
I wanted to follow up on our previous suggestions. We would like to hear back from you to ensure we can assist you further.
Thank you.
I will update once I run a DEV, expected this week.
Hi @smpa01 ,
Thank you for reaching out to the Microsoft Fabric Community Forum.
When merging multiple independent tables asynchronously from Bronze (non-Lakehouse) to Silver (Lakehouse Delta), and assuming no cross-table dependencies, the pattern is generally safe but there are still some risks and practical considerations to be consider.
Even if the tables are fully independent, running multiple merge operations at the same time can cause some problems. Each merge uses system resources like CPU, memory, and input/output (IO), so if you run many merges together, especially on a smaller cluster, it can slow down performance or even cause jobs to fail.
Also, if all the merge operations are reading from or writing to the same storage system as ADLS or One Lake, it can create IO bottlenecks or hit bandwidth limits, particularly when dealing with large volumes of data. Merge operations also involve internal processing (called shuffling), which uses a lot of memory. If too many such processes run at once, it can lead to memory issues or spill data to disk, which slows things down.
Lastly, when running merges asynchronously (in parallel), it becomes harder to catch errors. If one merge fails, you may not notice it unless you specifically check or handle it in your code.
So, while this pattern is supported and commonly used, it’s important to control the number of concurrent merges, monitor your cluster performance, and handle errors properly to avoid performance or reliability issues.
Hope this helps. Please reach out for further assistance.
Thank you.