Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I am trying to output many lakehouse tables to parquet files under the Files section within the lakehouse, but in pyspark notebook it can only seem to output 2 tables without failing, anything more than 2 tables and notebook constantly throws errors, I tried so many things like trying to split processing into batches of 2 and wait a few seconds but nothing works - is there an bug with notebook spark cluster it can't handling writing more than 2 files before it crashes?
And it is not a problem with source data, as when I choose it start at the tables where it failed last time it processes only 2 tables before it crashes again. Is there a workaround for this?
Solved! Go to Solution.
The data exists in the delta table in the lakehouse and it seems to display the data without issues - so I'm assuming the structure of the delta files are fine.
But it seems like a issue with data quality specifically column names with suprious characters like $ . " etc. But if it is already in the lakehouse as delta table don't understand why it can't output the data without failing. Anyway I had to clean up all the column names then it worked
Hi @MangoMagic
Thank you for reaching out microsoft fabric community forum.
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
It looks like there is a problem with 3rd delta file. can you check whether its actually a delta file and it contains _delta_log/ folder inside it and also verify access permissions for that file. Since spark executes lazily, it failes at the write step only and it wont fail at read step (but high chance that there is problem in read step).
Try to read & write it separately without a loop in another cell.
If this is the reason and it addressed your query, please accept it as a solution or reason for issue and give a 'Kudos' so other members can easily find it.
The data exists in the delta table in the lakehouse and it seems to display the data without issues - so I'm assuming the structure of the delta files are fine.
But it seems like a issue with data quality specifically column names with suprious characters like $ . " etc. But if it is already in the lakehouse as delta table don't understand why it can't output the data without failing. Anyway I had to clean up all the column names then it worked
Hi @MangoMagic ,
what's the error message? It seems to be cropped from your second screen shot. Would help for analysis. 🙂
How did you configure the starter pool?
Configure and manage starter pools in Fabric Spark. - Microsoft Fabric | Microsoft Learn
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
6 | |
3 | |
2 | |
2 | |
2 |