Solved: microsoft fabric notebook - saving a parquet file ...

Anonymous · ‎11-11-2024

I'm working in Microsoft Fabric and trying to save a PySpark DataFrame as a Parquet file with a specific filename in a Lakehouse. However, the default .write.parquet() method in PySpark creates files with a part-*.parquet naming convention, which doesn’t allow me to specify a custom file name. is there a method to do this?

Code in notebook -

output_path = f"abfss://<lakehouse_name>@<target_workspace_id>.dfs.core.windows.net/Files/metaData/parquet_metadata"

#Coalesce the DataFrame into a single partition and save as a single Parquet file to the target Lakehouse

metadata_df.coalesce(1).write.mode("overwrite").parquet(output_path)

spencer_sa · ‎11-11-2024

By default, pyspark outputs its parquet files in a folder structure - even if you coalesce them.
If you're really interested in a single .parquet file, you can use Pandas to output the file.

df.toPandas().to_parquet('/lakehouse/default/Files/datafile.parquet')

Usual caveats about using Pandas in notebooks apply - you'll be executing this code on a single node not clustered.

View solution in original post

Anonymous · ‎11-11-2024

thanks @spencer_sa , i was able to save the spark df as a pandas df and then use the abfss path to save the parquet file to target LH with desired name.

Anonymous · ‎11-11-2024

Hi @Anonymous

The name of the underlying parquet file is generated automatically. We cannot specify a custom name when saving it. I guess the prefix part ensures that the file name is always unique.

You can click '...' next to the file name to rename the file. But if you save a dataframe to overwrite the parquet file next time, it will generate a new underlying parquet file again with another prefix part. So I think it is not meaningful to rename it unless you will never modify this file.

Regardless of the underlying file name which has the prefix part, you can still use the root folder name, which is also the custom file name you specified, to query data from the parquet file. It acts like you are querying data from a single file without bothering how its underlying files are like.

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

spencer_sa · ‎11-11-2024

By default, pyspark outputs its parquet files in a folder structure - even if you coalesce them.
If you're really interested in a single .parquet file, you can use Pandas to output the file.

df.toPandas().to_parquet('/lakehouse/default/Files/datafile.parquet')

Usual caveats about using Pandas in notebooks apply - you'll be executing this code on a single node not clustered.

microsoft fabric notebook - saving a parquet file in lakehouse with desired name

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

microsoft fabric notebook - saving a parquet file in lakehouse with desired name

Helpful resources

Fabric Monthly Update - December 2025

FabCon Atlanta 2026