Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
mmcanelly
Helper II
Helper II

Difference between PySpark .saveAsTable() and .save()

I am loading many different flat files from the Files section of my lakehouse to delta tables via a Notebook. Most of the files I'm working with are relatively small (from 5 to 50,000 rows), with a handful of them being a couple million rows. I've noticed that the slowest part of the process is writing the files to the delta tables.

 

Here's what I was using to write the data to each table:

df.write.format("delta").mode("overwrite").saveAsTable(table_name)

 This was taking from 30 seconds to 3 minutes on average.

 

I recently tried this:

df.write.format("delta").mode("overwrite").save(f'Tables/{table_name}')

I am getting much quicker write times, averaging 10 to 50 seconds.

 

I'm trying to understand if using .save() is going to cause me any longer term issues I'm not aware of vs .saveAsTable(). Everything I've read about the differences between them seems to be for delta lakehouses in general and not specific to Fabric. For example, one thing I read said that .save() doesn't do certain things on the Hive cluster, and so it won't allow you to directly query the table. But I've tested this from the Fabric SQL endpoint, and I'm able to query the tables just fine. So not sure if issues like this are really an issue within Fabric.

 

Anybody have experience with this, and are there any issues I should watch out for with .save()? The time difference for each table x75 tables is a huge time savings if everything else is equal.

1 ACCEPTED SOLUTION
deborshi_nag
Memorable Member
Memorable Member

Hello @mmcanelly That's an interesting observation! I have not experienced that myself, but it is probably because of the order of things in which data is written to using those methods. By the way, both methods end up producing a managed Delta table, however internally it interprets as "path-write" (.save) and "table-write" (.saveAsTable). 

 

When you're using table-write Fabric registers with catalog and sql_endpoint so that the table is immediately visible, whereas using path-write it simply writes to the /Tables area and let Fabric auto-register to the catalog, that will eventually surface the table on the sql_endpoint. That explains your experience with latency using the saveAsTable method. 

 

If your use case doesn't demand that the table is immediately available on sql_endpoint, you could follow the appraoch of using save() rather than saveAsTable(). 

 

I would like to point out that I could not find any documentation on Fabric that explains the internal working. In documentation it always uses .save to write to /Files and .saveAsTable to write to /Tables. 

 

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

2 REPLIES 2
deborshi_nag
Memorable Member
Memorable Member

Hello @mmcanelly That's an interesting observation! I have not experienced that myself, but it is probably because of the order of things in which data is written to using those methods. By the way, both methods end up producing a managed Delta table, however internally it interprets as "path-write" (.save) and "table-write" (.saveAsTable). 

 

When you're using table-write Fabric registers with catalog and sql_endpoint so that the table is immediately visible, whereas using path-write it simply writes to the /Tables area and let Fabric auto-register to the catalog, that will eventually surface the table on the sql_endpoint. That explains your experience with latency using the saveAsTable method. 

 

If your use case doesn't demand that the table is immediately available on sql_endpoint, you could follow the appraoch of using save() rather than saveAsTable(). 

 

I would like to point out that I could not find any documentation on Fabric that explains the internal working. In documentation it always uses .save to write to /Files and .saveAsTable to write to /Tables. 

 

 

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

Thank you @deborshi_nag! So it sounds like as long as I refresh the SQL endpoint metadata after making my changes (per something like this: Programmatically Refresh & Sync SQL Analytics Endp... - Microsoft Fabric Community) so that the SQL endpoint sees the latest data right away, then .save() should be OK for what I'm doing. Now I'll have to see if it's still faster once I include that extra step. I feel like it should be because I'm only doing it once at the end as opposed to updating the catalog with every table write.

 

Thanks for your insight!

Helpful resources

Announcements
Free Fabric Certifications

Free Fabric Certifications

Get Fabric certified for free! Don't miss your chance.

January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.