Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes! Register now.

Reply
DennesTorres
Impactful Individual
Impactful Individual

PARQUET file size and updates

Hi,

 

Usually we try to keep the parquet file sizes large, otherwise the excess of small files can create problems for processing. The default file size, unless I'm mistaken, is set for 1GB. Most examples end up creating a single parquet file by default.

 

However, we can update records in delta tables, but not in parquet format. When we update one single record in a delta table, the entire parquet file containing that record is duplicated. Unless we will work with the history of the table, we should VACUUM the files frequently.

 

Is the 1GB file size bad for tables when we plan to make many updates on the records? It seems so, could someone confirm?

What's worse, many small files causing problems to read, or one or only a few very large files, causing problems to update?

(By the way, in my opinion record updates should be kept to a minimum, but the features are announced this way, in fact, even the Data Warehouse uses this file format for storage).

Kind Regards,

 

Dennes

 

 

2 REPLIES 2
HimanshuS-msft
Microsoft Employee
Microsoft Employee

Hello @DennesTorres 

I think this is catch 22 situation , if you keep the file size at 1 GB updates are slow but read is fine m otherwise updates are fine and read is slow . As i understand from the post that you do not perform a lot of updates and do perform lot of read and so I think going ahead with 1 GB file size , should be the right approah .

 

Thanks 
Himanshu 

 

 

Hi,

 

Thank you, I also agree and It's important to get this kind of feedback.

 

I believe as a result, when someone decides to use an upsert in opposited to a dimension type 2 (SCD), this has a consequence and they should know that to take a decision if 1GB of file size is good for their solution. Or not? Any thoughts about this?

I'm also wondering why we have no control about this on the data warehouse, I posted a different question about this.

 

Kind Regards,

 

Dennes

Helpful resources

Announcements
September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors