Solved: Optimize saved to lakehouse table

YCastano · ‎07-26-2024

I have a notebook with dataframes and dictionaries in pyspark that I process, and then save them in a Lakehouse table, but it is taking a lot of time. I have tried writeto().append, and write.csv() but they take more time than I need, how can I optimize the loading to the lakehouse table?

Anonymous · ‎07-28-2024

Hi @YCastano ,

I have some suggestions to offer here:

Place static fields in the query outside the for loop, modify the common part in the custom method to extract it, and set the condition to judge the specific statement.

Split the code into multiple code blocks and do not run them all together. For example, you can save the processing results to a temporary file first, and then further process the processed file in the next code block. After the processing is completed, you can consider deleting the file.

Avoid using transformations such as groupBy and join unless necessary.

If reused multiple times, keep the intermediate results.

Best Regards,
Yang
Community Support Team

If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

View solution in original post

Anonymous · ‎07-28-2024

Hi @YCastano ,

I have some suggestions to offer here:

Place static fields in the query outside the for loop, modify the common part in the custom method to extract it, and set the condition to judge the specific statement.

Split the code into multiple code blocks and do not run them all together. For example, you can save the processing results to a temporary file first, and then further process the processed file in the next code block. After the processing is completed, you can consider deleting the file.

Avoid using transformations such as groupBy and join unless necessary.

If reused multiple times, keep the intermediate results.

Best Regards,
Yang
Community Support Team

If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

Optimize saved to lakehouse table

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Optimize saved to lakehouse table

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - May 2025

Fabric Community Update - June 2025