Solved: notebook pyspark constant errors writing more than...

MangoMagic · ‎05-03-2025

I am trying to output many lakehouse tables to parquet files under the Files section within the lakehouse, but in pyspark notebook it can only seem to output 2 tables without failing, anything more than 2 tables and notebook constantly throws errors, I tried so many things like trying to split processing into batches of 2 and wait a few seconds but nothing works - is there an bug with notebook spark cluster it can't handling writing more than 2 files before it crashes?

And it is not a problem with source data, as when I choose it start at the tables where it failed last time it processes only 2 tables before it crashes again. Is there a workaround for this?

MangoMagic · ‎05-09-2025

The data exists in the delta table in the lakehouse and it seems to display the data without issues - so I'm assuming the structure of the delta files are fine.

But it seems like a issue with data quality specifically column names with suprious characters like $ . " etc. But if it is already in the lakehouse as delta table don't understand why it can't output the data without failing. Anyway I had to clean up all the column names then it worked

View solution in original post

v-shamiliv · ‎05-09-2025

Hi @MangoMagic
Thank you for reaching out microsoft fabric community forum.

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

samuvva · ‎05-04-2025

It looks like there is a problem with 3rd delta file. can you check whether its actually a delta file and it contains _delta_log/ folder inside it and also verify access permissions for that file. Since spark executes lazily, it failes at the write step only and it wont fail at read step (but high chance that there is problem in read step).
Try to read & write it separately without a loop in another cell.

If this is the reason and it addressed your query, please accept it as a solution or reason for issue and give a 'Kudos' so other members can easily find it.

MangoMagic · ‎05-09-2025

The data exists in the delta table in the lakehouse and it seems to display the data without issues - so I'm assuming the structure of the delta files are fine.

But it seems like a issue with data quality specifically column names with suprious characters like $ . " etc. But if it is already in the lakehouse as delta table don't understand why it can't output the data without failing. Anyway I had to clean up all the column names then it worked