cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
saveenrMSFT
Community Admin
Community Admin

Number of partitions created when I output a parquet file

How can I control the number of partitions created when I output a parquet file?

1 ACCEPTED SOLUTION
chetnachaudhari
Frequent Visitor

Hi @saveenrMSFT,

  If you are using PySpark, you can control the number of partitions created when you output a Parquet file by using the repartition method or the coalesce method on your DataFrame before writing it to Parquet. These methods allow you to control the number of output partitions, which in turn affects the number of Parquet files generated.

Thanks,

Chetna

View solution in original post

1 REPLY 1
chetnachaudhari
Frequent Visitor

Hi @saveenrMSFT,

  If you are using PySpark, you can control the number of partitions created when you output a Parquet file by using the repartition method or the coalesce method on your DataFrame before writing it to Parquet. These methods allow you to control the number of output partitions, which in turn affects the number of Parquet files generated.

Thanks,

Chetna

Helpful resources

Announcements
MPPC 2023 Fabric Carousel

Power Platform Conference-Fabric and Power BI Sessions

Join us Oct 1 - 6 in Las Vegas for the Microsoft Power Platform Conference.

Fabric August Update Carousel

Fabric August 2023 Update

Check out the August 2023 Fabric update to learn about new features.

Learn Live

Learn Live: Event Series

Join Microsoft Reactor and learn from developers.

Get Help with Synapse in the General Discussion Forum

General Discussion Forum

Ask your questions about Synapse here!