Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.
I'm working in Microsoft Fabric and trying to save a PySpark DataFrame as a Parquet file with a specific filename in a Lakehouse. However, the default .write.parquet() method in PySpark creates files with a part-*.parquet naming convention, which doesn’t allow me to specify a custom file name. is there a method to do this?
Code in notebook -
Solved! Go to Solution.
By default, pyspark outputs its parquet files in a folder structure - even if you coalesce them.
If you're really interested in a single .parquet file, you can use Pandas to output the file.
df.toPandas().to_parquet('/lakehouse/default/Files/datafile.parquet')Usual caveats about using Pandas in notebooks apply - you'll be executing this code on a single node not clustered.
thanks @spencer_sa , i was able to save the spark df as a pandas df and then use the abfss path to save the parquet file to target LH with desired name.
Hi @Anonymous
The name of the underlying parquet file is generated automatically. We cannot specify a custom name when saving it. I guess the prefix part ensures that the file name is always unique.
You can click '...' next to the file name to rename the file. But if you save a dataframe to overwrite the parquet file next time, it will generate a new underlying parquet file again with another prefix part. So I think it is not meaningful to rename it unless you will never modify this file.
Regardless of the underlying file name which has the prefix part, you can still use the root folder name, which is also the custom file name you specified, to query data from the parquet file. It acts like you are querying data from a single file without bothering how its underlying files are like.
Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!
By default, pyspark outputs its parquet files in a folder structure - even if you coalesce them.
If you're really interested in a single .parquet file, you can use Pandas to output the file.
df.toPandas().to_parquet('/lakehouse/default/Files/datafile.parquet')Usual caveats about using Pandas in notebooks apply - you'll be executing this code on a single node not clustered.