Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
YCastano
Frequent Visitor

Optimize saved to lakehouse table

I have a notebook with dataframes and dictionaries in pyspark that I process, and then save them in a Lakehouse table, but it is taking a lot of time. I have tried writeto().append, and write.csv() but they take more time than I need, how can I optimize the loading to the lakehouse table?

 

YCastano_0-1721976805502.png

YCastano_1-1721976947609.png

 

YCastano_2-1721977010581.png

 

 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @YCastano ,

 

I have some suggestions to offer here:

 

Place static fields in the query outside the for loop, modify the common part in the custom method to extract it, and set the condition to judge the specific statement.

 

Split the code into multiple code blocks and do not run them all together. For example, you can save the processing results to a temporary file first, and then further process the processed file in the next code block. After the processing is completed, you can consider deleting the file.

 

Avoid using transformations such as groupBy and join unless necessary.

 

If reused multiple times, keep the intermediate results.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

View solution in original post

1 REPLY 1
Anonymous
Not applicable

Hi @YCastano ,

 

I have some suggestions to offer here:

 

Place static fields in the query outside the for loop, modify the common part in the custom method to extract it, and set the condition to judge the specific statement.

 

Split the code into multiple code blocks and do not run them all together. For example, you can save the processing results to a temporary file first, and then further process the processed file in the next code block. After the processing is completed, you can consider deleting the file.

 

Avoid using transformations such as groupBy and join unless necessary.

 

If reused multiple times, keep the intermediate results.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.