Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.
Ok so here is my journey:
Tried to use PQ Online - but face a cap of 10 minutes, and kept getting a 10 minute timeout error, and the data flow would fail.
So, I did 80% in PQ, Online in a Gen2 Dataflow, on the less expensive steps (those that were not full scan)
Then I loaded the output from the Gen2 Dataflow to a Lakehouse as a Delta Table
Then I picked up that Delta Table, and processed the remaining 20%, in PySpark, in a Notebook
I then wrote it back to the Lakehouse as a Delta Table. But I could only save the table name as lower case (which was really annoying) - it just wouldn't save the name in UPPER CASE, when I explored things in the Lakehouse File explorer.
After a while, I decided, instead of writing from the Notebook directly to the Lakehouse, as a Delta Table, to the Table space,, to write the output as Parquet files in the Files space of the Lakehouse
I then created a Data pipeline with the following activities:
1. Original Gen2 Dataflow (80%) (its ok in performance, about 4 mins on fairly simple steps)
2. Notebook Pyspark (for the remaining 20% - it's super quick!, only 30 seconds - a short time on complex steps)
3. A Copy Activitiy to pick up the Parquet files from the Files space, and write the destination as a Delta Table to Lake House (here in this step, I can actually set the name of Delta Table as UPPER CASE - which is what I all the long wanted)
4. A Delete file steps to delete the temporary Parque files that the Notebook loaded into the File space
Now the problem is, when I inspect the Delta Table (with its upper case name), and click view files. The Parquet files behind it (in the Table space), there is 1 file that is 0 KB.
When I connect to the SQL endpoint and try loading the data in Excel or Power BI, I get the following error:
DataSource.Error: Microsoft SQL: Error handling external file: 'Invalid: Parquet file size is 0 bytes'. File/External table name: 'dbo.ACT_ALL_INV_LINE_FINAL'.
Statement ID: {52DF811F-B06F-4DC8-B690-338B0FFAEA06} | Query hash: 0x529BB7EEE88AA79D | Distributed request ID: {63247B40-034D-4D04-8C5D-C5BAE8378554}
Details:
DataSourceKind=SQL
DataSourcePath=nfsigzuek6wudkuj4iln6rusta-vf6c7n6onr6elfh354elx7s6ly.datawarehouse.pbidedicated.windows.net;*********
Message=Error handling external file: 'Invalid: Parquet file size is 0 bytes'. File/External table name: 'dbo.ACT_ALL_INV_LINE_FINAL'.
Statement ID: {52DF811F-B06F-4DC8-B690-338B0FFAEA06} | Query hash: 0x529BB7EEE88AA79D | Distributed request ID: {63247B40-034D-4D04-8C5D-C5BAE8378554}
ErrorCode=-2146232060
No idea how to solve this, or if the approach I'm using is optimum. I'm kind of taking the longer route, / workaround to get to a Delta Table where I can write its name as UPPER CASE, since you can't do that from a notebook in Pyspark,...yet.
Thanks so much in advance!
Great that you have the workaround.
The need to delete_SUCCESS file is tracked as bug internally.
UPPER CASE for SPARK seems a general limitation of spark side Caps are not preserved when Creating delta tables in Azure Synapse - Microsoft Q&A. You might consier to feedback to Data Enginnering forum.
I think I know why this was happening
When Notebook writes the Parquet files, e.g. using the line:
1. Original Gen2 Dataflow (80%)
2. Notebook Pyspark (for the remaining 20% - it's super quick!, only 30 seconds - a short time on complex steps)
3. Delete the _SUCCESS file only (before the Copy Activity)
4. A Copy Activitiy to pick up the Parquet files from the Files space, and destination as a Delta Table in Lake House (here I can actually set the name of Delta Table as UPPER CASE - which is what I wanted)
5. A Delete file steps to delete the temporary Parque files that the Notebook loaded into the File space
Not ideal, but its a workaround for me! SPARK should really support UPPER CASE table names.
You can exclude the SUCCESS file during copy, by using wild card to specify the files to be copied. Something like this:
If you used "file path" option, there too if you specify the "file format" as parquet, it shouldnt have copied the success file. If that happens, it looks like a bug, please let me know if that was the case.
Thank you. I will try it. It never occured to me to use a WildCard Copy.