Re: DataSource.Error: Microsoft SQL: Error handlin...

st_0999 · ‎08-05-2023

Ok so here is my journey:

Tried to use PQ Online - but face a cap of 10 minutes, and kept getting a 10 minute timeout error, and the data flow would fail.

So, I did 80% in PQ, Online in a Gen2 Dataflow, on the less expensive steps (those that were not full scan)

Then I loaded the output from the Gen2 Dataflow to a Lakehouse as a Delta Table

Then I picked up that Delta Table, and processed the remaining 20%, in PySpark, in a Notebook

I then wrote it back to the Lakehouse as a Delta Table. But I could only save the table name as lower case (which was really annoying) - it just wouldn't save the name in UPPER CASE, when I explored things in the Lakehouse File explorer.

After a while, I decided, instead of writing from the Notebook directly to the Lakehouse, as a Delta Table, to the Table space,, to write the output as Parquet files in the Files space of the Lakehouse

I then created a Data pipeline with the following activities:

1. Original Gen2 Dataflow (80%) (its ok in performance, about 4 mins on fairly simple steps)

2. Notebook Pyspark (for the remaining 20% - it's super quick!, only 30 seconds - a short time on complex steps)

3. A Copy Activitiy to pick up the Parquet files from the Files space, and write the destination as a Delta Table to Lake House (here in this step, I can actually set the name of Delta Table as UPPER CASE - which is what I all the long wanted)

4. A Delete file steps to delete the temporary Parque files that the Notebook loaded into the File space

Now the problem is, when I inspect the Delta Table (with its upper case name), and click view files. The Parquet files behind it (in the Table space), there is 1 file that is 0 KB.

When I connect to the SQL endpoint and try loading the data in Excel or Power BI, I get the following error:

DataSource.Error: Microsoft SQL: Error handling external file: 'Invalid: Parquet file size is 0 bytes'. File/External table name: 'dbo.ACT_ALL_INV_LINE_FINAL'.
Statement ID: {52DF811F-B06F-4DC8-B690-338B0FFAEA06} | Query hash: 0x529BB7EEE88AA79D | Distributed request ID: {63247B40-034D-4D04-8C5D-C5BAE8378554}

Details:
DataSourceKind=SQL
DataSourcePath=nfsigzuek6wudkuj4iln6rusta-vf6c7n6onr6elfh354elx7s6ly.datawarehouse.pbidedicated.windows.net;*********
Message=Error handling external file: 'Invalid: Parquet file size is 0 bytes'. File/External table name: 'dbo.ACT_ALL_INV_LINE_FINAL'.
Statement ID: {52DF811F-B06F-4DC8-B690-338B0FFAEA06} | Query hash: 0x529BB7EEE88AA79D | Distributed request ID: {63247B40-034D-4D04-8C5D-C5BAE8378554}
ErrorCode=-2146232060

No idea how to solve this, or if the approach I'm using is optimum. I'm kind of taking the longer route, / workaround to get to a Delta Table where I can write its name as UPPER CASE, since you can't do that from a notebook in Pyspark,...yet.

Thanks so much in advance!

GraceGu · ‎08-07-2023

Great that you have the workaround.

The need to delete_SUCCESS file is tracked as bug internally.

UPPER CASE for SPARK seems a general limitation of spark side Caps are not preserved when Creating delta tables in Azure Synapse - Microsoft Q&A. You might consier to feedback to Data Enginnering forum.

st_0999 · ‎08-05-2023

I think I know why this was happening

When Notebook writes the Parquet files, e.g. using the line:

sorted_df.write.mode("overwrite").format("parquet").save("Files/ + parquet file names)

It also writes a _SUCCESS file.

The Copy step in the Data Pipeline also copies this, which messes up querying the Delta Table. Probably worth the Fabric team looking into fixing?

So I had to in the Data Pipeline:

1. Original Gen2 Dataflow (80%)

2. Notebook Pyspark (for the remaining 20% - it's super quick!, only 30 seconds - a short time on complex steps)

3. Delete the _SUCCESS file only (before the Copy Activity)

4. A Copy Activitiy to pick up the Parquet files from the Files space, and destination as a Delta Table in Lake House (here I can actually set the name of Delta Table as UPPER CASE - which is what I wanted)

5. A Delete file steps to delete the temporary Parque files that the Notebook loaded into the File space

Not ideal, but its a workaround for me! SPARK should really support UPPER CASE table names.

ajarora · ‎08-07-2023

You can exclude the SUCCESS file during copy, by using wild card to specify the files to be copied. Something like this:

If you used "file path" option, there too if you specify the "file format" as parquet, it shouldnt have copied the success file. If that happens, it looks like a bug, please let me know if that was the case.

st_0999 · ‎08-07-2023

Thank you. I will try it. It never occured to me to use a WildCard Copy.

DataSource.Error: Microsoft SQL: Error handling external file: 'Invalid: Parquet file size is 0 byte

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

DataSource.Error: Microsoft SQL: Error handling external file: 'Invalid: Parquet file size is 0 byte

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026