Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
MangoMagic
Regular Visitor

notebook pyspark constant errors writing more than 2 files

I am trying to output many lakehouse tables to parquet files under the Files section within the lakehouse, but in pyspark notebook it can only seem to output 2 tables without failing, anything more than 2 tables and notebook constantly throws errors, I tried so many things like trying to split processing into batches of 2 and wait a few seconds but nothing works - is there an bug with notebook spark cluster it can't handling writing more than 2 files before it crashes?

notebook issue 1.pngnotebook issue 2.png

And it is not a problem with source data, as when I choose it start at the tables where it failed last time it processes only 2 tables before it crashes again. Is there a workaround for this?

 

1 ACCEPTED SOLUTION

The data exists in the delta table in the lakehouse and it seems to display the data without issues - so I'm assuming the structure of the delta files are fine.

 

But it seems like a issue with data quality specifically column names with suprious characters like $ . " etc. But if it is already in the lakehouse as delta table don't understand why it can't output the data without failing. Anyway I had to clean up all the column names then it worked

View solution in original post

5 REPLIES 5
v-shamiliv
Community Support
Community Support

Hi @MangoMagic 
Thank you for reaching out microsoft fabric community forum.

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

 

samuvva
Microsoft Employee
Microsoft Employee

It looks like there is a problem with 3rd delta file. can you check whether its actually a delta file and it contains _delta_log/ folder inside it and also verify access permissions for that file. Since spark executes lazily, it failes at the write step only and it wont fail at read step (but high chance that there is problem in read step).
Try to read & write it separately without a loop in another cell. 

If this is the reason and it addressed your query, please accept it as a solution or reason for issue and give a 'Kudos' so other members can easily find it.

The data exists in the delta table in the lakehouse and it seems to display the data without issues - so I'm assuming the structure of the delta files are fine.

 

But it seems like a issue with data quality specifically column names with suprious characters like $ . " etc. But if it is already in the lakehouse as delta table don't understand why it can't output the data without failing. Anyway I had to clean up all the column names then it worked

ObungiNiels
Resolver III
Resolver III

Hi @MangoMagic ,

what's the error message? It seems to be cropped from your second screen shot. Would help for analysis. 🙂 

 

lbendlin
Super User
Super User

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.