Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
I have a notebook with dataframes and dictionaries in pyspark that I process, and then save them in a Lakehouse table, but it is taking a lot of time. I have tried writeto().append, and write.csv() but they take more time than I need, how can I optimize the loading to the lakehouse table?
Solved! Go to Solution.
Hi @YCastano ,
I have some suggestions to offer here:
Place static fields in the query outside the for loop, modify the common part in the custom method to extract it, and set the condition to judge the specific statement.
Split the code into multiple code blocks and do not run them all together. For example, you can save the processing results to a temporary file first, and then further process the processed file in the next code block. After the processing is completed, you can consider deleting the file.
Avoid using transformations such as groupBy and join unless necessary.
If reused multiple times, keep the intermediate results.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @YCastano ,
I have some suggestions to offer here:
Place static fields in the query outside the for loop, modify the common part in the custom method to extract it, and set the condition to judge the specific statement.
Split the code into multiple code blocks and do not run them all together. For example, you can save the processing results to a temporary file first, and then further process the processed file in the next code block. After the processing is completed, you can consider deleting the file.
Avoid using transformations such as groupBy and join unless necessary.
If reused multiple times, keep the intermediate results.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!