Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.

Reply
st_0999
Helper II
Helper II

DataSource.Error: Microsoft SQL: Error handling external file: 'Invalid: Parquet file size is 0 byte

Ok so here is my journey:

 

Tried to use PQ Online - but face a cap of 10 minutes, and kept getting a 10 minute timeout error, and the data flow would fail.

So, I did 80% in PQ, Online in a Gen2 Dataflow, on the less expensive steps (those that were not full scan)

Then I loaded the output from the Gen2 Dataflow to a Lakehouse as a Delta Table

 

Then I picked up that Delta Table, and processed the remaining 20%, in PySpark, in a Notebook

I then wrote it back to the Lakehouse as a Delta Table. But I could only save the table name as lower case (which was really annoying) - it just wouldn't save the name in UPPER CASE, when I explored things in the Lakehouse File explorer.

 

After a while, I decided, instead of writing from the Notebook directly to the Lakehouse, as a Delta Table, to the Table space,, to write the output as Parquet files in the Files space of the Lakehouse

 

I then created a Data pipeline with the following activities:

1. Original Gen2 Dataflow (80%) (its ok in performance, about 4 mins on fairly simple steps)

2. Notebook Pyspark (for the remaining 20% - it's super quick!, only 30 seconds - a short time on complex steps)

3. A Copy Activitiy to pick up the Parquet files from the Files space, and write the destination as a Delta Table to Lake House (here in this step, I can actually set the name of Delta Table as UPPER CASE - which is what I all the long wanted)

4. A Delete file steps to delete the temporary Parque files that the Notebook loaded into the File space

 

Now the problem is, when I inspect the Delta Table (with its upper case name), and click view files. The Parquet files behind it (in the Table space), there is 1 file that is 0 KB. 

 

When I connect to the SQL endpoint and try loading the data in Excel or Power BI, I get the following error:

 

DataSource.Error: Microsoft SQL: Error handling external file: 'Invalid: Parquet file size is 0 bytes'. File/External table name: 'dbo.ACT_ALL_INV_LINE_FINAL'.
Statement ID: {52DF811F-B06F-4DC8-B690-338B0FFAEA06} | Query hash: 0x529BB7EEE88AA79D | Distributed request ID: {63247B40-034D-4D04-8C5D-C5BAE8378554}

 

Details:
DataSourceKind=SQL
DataSourcePath=nfsigzuek6wudkuj4iln6rusta-vf6c7n6onr6elfh354elx7s6ly.datawarehouse.pbidedicated.windows.net;*********
Message=Error handling external file: 'Invalid: Parquet file size is 0 bytes'. File/External table name: 'dbo.ACT_ALL_INV_LINE_FINAL'.
Statement ID: {52DF811F-B06F-4DC8-B690-338B0FFAEA06} | Query hash: 0x529BB7EEE88AA79D | Distributed request ID: {63247B40-034D-4D04-8C5D-C5BAE8378554}
ErrorCode=-2146232060

 

No idea how to solve this, or if the approach I'm using is optimum. I'm kind of taking the longer route, / workaround to get to a Delta Table where I can write its name as UPPER CASE, since you can't do that from a notebook in Pyspark,...yet.

 

Thanks so much in advance!

 

 

4 REPLIES 4
GraceGu
Microsoft Employee
Microsoft Employee

Great that you have the workaround.

The need to delete_SUCCESS file is tracked as bug internally.

UPPER CASE for SPARK seems a general limitation of spark side Caps are not preserved when Creating delta tables in Azure Synapse - Microsoft Q&A. You might consier to feedback to Data Enginnering forum. 

st_0999
Helper II
Helper II

I think I know why this was happening

 

When Notebook writes the Parquet files, e.g. using the line:

sorted_df.write.mode("overwrite").format("parquet").save("Files/ + parquet file names)
 
It also writes a _SUCCESS file. 
 
The Copy step in the Data Pipeline also copies this, which messes up querying the Delta Table. Probably worth the Fabric team looking into fixing?
 
So I had to in the Data Pipeline:
 

1. Original Gen2 Dataflow (80%) 

2. Notebook Pyspark (for the remaining 20% - it's super quick!, only 30 seconds - a short time on complex steps)

3. Delete the _SUCCESS file only (before the Copy Activity)

4. A Copy Activitiy to pick up the Parquet files from the Files space, and destination as a Delta Table in Lake House (here I can actually set the name of Delta Table as UPPER CASE - which is what I wanted)

5. A Delete file steps to delete the temporary Parque files that the Notebook loaded into the File space

 

Not ideal, but its a workaround for me! SPARK should really support UPPER CASE table names.

ajarora
Microsoft Employee
Microsoft Employee

You can exclude the SUCCESS file during copy, by using wild card to specify the files to be copied. Something like this:

ajarora_0-1691439172839.png

 

If you used "file path" option, there too if you specify the "file format" as parquet, it shouldnt have copied the success file. If that happens, it looks like a bug, please let me know if that was the case.

Thank you. I will try it. It never occured to me to use a WildCard Copy. 

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.