Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.

Reply
Anonymous
Not applicable

microsoft fabric notebook - saving a parquet file in lakehouse with desired name

I'm working in Microsoft Fabric and trying to save a PySpark DataFrame as a Parquet file with a specific filename in a Lakehouse. However, the default .write.parquet() method in PySpark creates files with a part-*.parquet naming convention, which doesn’t allow me to specify a custom file name. is there a method to do this?

 

Code in notebook - 

output_path = f"abfss://<lakehouse_name>@<target_workspace_id>.dfs.core.windows.net/Files/metaData/parquet_metadata"

#Coalesce the DataFrame into a single partition and save as a single Parquet file to the target Lakehouse
metadata_df.coalesce(1).write.mode("overwrite").parquet(output_path)
1 ACCEPTED SOLUTION
spencer_sa
Super User
Super User

By default, pyspark outputs its parquet files in a folder structure - even if you coalesce them.
If you're really interested in a single .parquet file, you can use Pandas to output the file.

 
df.toPandas().to_parquet('/lakehouse/default/Files/datafile.parquet')

Usual caveats about using Pandas in notebooks apply - you'll be executing this code on a single node not clustered.


View solution in original post

3 REPLIES 3
Anonymous
Not applicable

thanks @spencer_sa , i was able to save the spark df as a pandas df and then use the abfss path to save the parquet file to target LH with desired name.

Anonymous
Not applicable

Hi @Anonymous 

 

The name of the underlying parquet file is generated automatically. We cannot specify a custom name when saving it. I guess the prefix part ensures that the file name is always unique. 

 

You can click '...' next to the file name to rename the file. But if you save a dataframe to overwrite the parquet file next time, it will generate a new underlying parquet file again with another prefix part. So I think it is not meaningful to rename it unless you will never modify this file. 

vjingzhanmsft_2-1731381193393.png

 

Regardless of the underlying file name which has the prefix part, you can still use the root folder name, which is also the custom file name you specified, to query data from the parquet file. It acts like you are querying data from a single file without bothering how its underlying files are like. 

vjingzhanmsft_1-1731380865160.png

 

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

spencer_sa
Super User
Super User

By default, pyspark outputs its parquet files in a folder structure - even if you coalesce them.
If you're really interested in a single .parquet file, you can use Pandas to output the file.

 
df.toPandas().to_parquet('/lakehouse/default/Files/datafile.parquet')

Usual caveats about using Pandas in notebooks apply - you'll be executing this code on a single node not clustered.


Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors