Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
ntimmerman
Helper I
Helper I

Saving small delta table via notebook extremely slow

For the project I'm currently working on, I need to copy data into our bronze layer for a relatively large number of dimension tables. When I do this with a notebook, I notice that this is extremely slow. 

Even tiny tables with only 10 records or so, take about 6 seconds to save. It doesn't matter if I do a complete overwrite or merge the new data with existing data. 

All options are still at the defaults, So I haven't enabled any write optimization but I assumed that this would only make a difference for larger tables. please advise

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @ntimmerman,

In fact, I think this cost of processing almost upper to limit and changing to use other functions may not obviously improve the performance when you operate with small tables.
For optimized on the fabric notebook performance, you can take a look at the following document if it helps:

Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

View solution in original post

6 REPLIES 6
ntimmerman
Helper I
Helper I

Sure!

We are on a F4 Fabric capacity.

I added some logging to my pyspark notebook. One example is this table: 

 Imported 8 records for table aircraft_hours_history query time: 1.9679827690124512, import time: 9.02900743484497 seconds

This table only has 6 columns and only numeric and date-time columns

ntimmerman_0-1724644661398.png

 

The table only has 9 records in it:

ntimmerman_1-1724644894814.png

 

But it still takes more than 9 seconds to merge the 9 records from the source. 

The code for merging the table is pretty simple:

 
            #get a reference to the destination table
            deltaTable = DeltaTable.forPath(spark, 'Tables/'+row['abbr'] +f"_{tableName}")
           
            #merge new data with existing data, don't delete anything, only update and insert
            deltaTable.alias('target').merge(dfUpdated.alias('source'),"source.id = target.id").whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
 
Any clues on how I can improve the performance of this?
A sidenote is that upserting data in larger tables costs roughly the same amount of time.
So it seems like there's just a big overhead cost in trying to access the delta table or so?
 

 

 

Anonymous
Not applicable

HI @ntimmerman,

I think these times may also include the spend on initialize components and resource. (I test a table with more columns and records, and they haven't spent much more time than the small table)
For your scenario, I'd like to suggest you use other functions to handle import data and initialize delta table steps, and you can use 'DeltaTable' function to handle complex records compare and merge operations.(at these ranges, 'DeltaTable' function has better performance than other functions )

Regards,

Xiaoxin Sheng

Hi @Anonymous , thanks for your quick reply! That's greatly appreciated! Can you elaborate a bit on what other functions are available which would have better performance for those small tables?

 

Thanks

Anonymous
Not applicable

Hi @ntimmerman,

In fact, I think this cost of processing almost upper to limit and changing to use other functions may not obviously improve the performance when you operate with small tables.
For optimized on the fabric notebook performance, you can take a look at the following document if it helps:

Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn

Regards,

Xiaoxin Sheng

Thank you! 

I found out that there is hardly any difference between writing 10 rows or writing 10.000 rows. 

It seems like there's just a lot of overhead in the start of the writing process.

So, I've optimized the process now to combine tables before dumping them in the bronze layer. 

 

Anonymous
Not applicable

Hi @ntimmerman ,

Can you please share some more detail information about this issue? They should help us clarify your scenario and test to troubleshoot.

How to Get Your Question Answered Quickly  

Regards,

Xiaoxin Sheng

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.