Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
For the project I'm currently working on, I need to copy data into our bronze layer for a relatively large number of dimension tables. When I do this with a notebook, I notice that this is extremely slow.
Even tiny tables with only 10 records or so, take about 6 seconds to save. It doesn't matter if I do a complete overwrite or merge the new data with existing data.
All options are still at the defaults, So I haven't enabled any write optimization but I assumed that this would only make a difference for larger tables. please advise
Solved! Go to Solution.
Hi @ntimmerman,
In fact, I think this cost of processing almost upper to limit and changing to use other functions may not obviously improve the performance when you operate with small tables.
For optimized on the fabric notebook performance, you can take a look at the following document if it helps:
Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn
Regards,
Xiaoxin Sheng
Sure!
We are on a F4 Fabric capacity.
I added some logging to my pyspark notebook. One example is this table:
Imported 8 records for table aircraft_hours_history query time: 1.9679827690124512, import time: 9.02900743484497 seconds
This table only has 6 columns and only numeric and date-time columns
The table only has 9 records in it:
But it still takes more than 9 seconds to merge the 9 records from the source.
The code for merging the table is pretty simple:
HI @ntimmerman,
I think these times may also include the spend on initialize components and resource. (I test a table with more columns and records, and they haven't spent much more time than the small table)
For your scenario, I'd like to suggest you use other functions to handle import data and initialize delta table steps, and you can use 'DeltaTable' function to handle complex records compare and merge operations.(at these ranges, 'DeltaTable' function has better performance than other functions )
Regards,
Xiaoxin Sheng
Hi @Anonymous , thanks for your quick reply! That's greatly appreciated! Can you elaborate a bit on what other functions are available which would have better performance for those small tables?
Thanks
Hi @ntimmerman,
In fact, I think this cost of processing almost upper to limit and changing to use other functions may not obviously improve the performance when you operate with small tables.
For optimized on the fabric notebook performance, you can take a look at the following document if it helps:
Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn
Regards,
Xiaoxin Sheng
Thank you!
I found out that there is hardly any difference between writing 10 rows or writing 10.000 rows.
It seems like there's just a lot of overhead in the start of the writing process.
So, I've optimized the process now to combine tables before dumping them in the bronze layer.
Hi @ntimmerman ,
Can you please share some more detail information about this issue? They should help us clarify your scenario and test to troubleshoot.
How to Get Your Question Answered Quickly
Regards,
Xiaoxin Sheng
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
18 | |
17 | |
6 | |
5 | |
4 |