Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes! Register now.

Reply
DennesTorres
Impactful Individual
Impactful Individual

Difference between Data Warehouse and Lakehouse

Hi,

I noticed some differences between the lakehouse and the data warehouse and I'm not much sure why these differences are there. 

The lakehouse:

  • The files are compacted (snappy format).
  • The files are created based on a maximum size of 1GB, avoiding the small files problem
  • The files use the VORDER format
  • The folder with the table contains an _delta_log folder and the files

 

The Data Warehouse:

 

  • The files are not compacted
  • The files are prone to the small files problem (the sample fact_sale table has 26 files in a warehouse and only 1 in a lakehouse).
  • The files are not in the VORDER format (or at least, this is not defined in the metadata).
  • The folder with the table contains an _delta_log folder and another folder, the files are inside the 2nd folder

 

The questions about these differences:

  • How the differences are positive for each one, lakehouse and data warehouse, although they are the opposite? For example, why the warehouse is not affected by small files (or it is?) ? Why the warehouse doesn't need the VORDER ?
  • After an update in the data warehouse table, new parquet files were created, as expected. But the delta_log files were not updated. Why?
  • After creating a shortcut from the Data Warehouse to a lakehouse, the VACUUM didn't work. I imagine it doesn't because the new parquet files were not updated in the delta_log, am I correct? How to fix this?
  • The Tables format are different, I mean, there is a sub-folder in the data warehouse, while this sub-folder doesn't exist in the lakehouse. Is this why the VACUUM doesn't work?
  • Should the VACUUM work, or is there some other method in place to replace its need ?

 

Thank you in advance !

 

 

2 REPLIES 2
BryanCarmichael
Helper II
Helper II

I think that they have different purposes - the major one being that a warehouse supports DML as well as DDL  vs the Lakehouse only supporting DML.

There are other nuances e.g. max size for a varchar field in warehouse is 4000 in lakehosue it is 8000

 

We leverage a warehouse primarily as it meets our needs (source to drive powerbi reporting) - but we do have some items in the lakehouse (large unstructured text fields) 

In short I dont think its a case of either or more a case of what fits my use case / what are my team more comfortable with

Hi, @BryanCarmichael ,

 

The point is about the storage differences I mentioned in the question.

Both are stored in OneLake, the storage for the entire Microsoft Fabric. However, they follow almost opposite behaviours, as I highlighted above. So, the questions about storage mentioned.... 

Kind Regards,

 

Dennes

Helpful resources

Announcements
September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.