Solved: Re: V-Order & Z-Order

sjpark · ‎03-07-2024

Hi, i'm using Fabric.

And i got a simple question.

What is the difference between V-Order and Z-Order?

I know V-Order is a write time optimization to the parquet file format that enables lightning-fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark, and others.(as MS Docs said)

But i can't find any Docs of Z-Order, and even i can't understood well after reading V-Order & Z-Order disc.

Please help this newbie, i will really appricate all your comments.

Thanks a lot.

Park.

v-nikhilan-msft · ‎03-07-2024

Hi @sjpark
Thanks for using Fabric Community.

V-Order:
V-Order is a write-time optimization specifically designed for the Parquet file format within the Microsoft Fabric ecosystem. Its primary goal is to enhance read performance under various compute engines, including Power BI, SQL, and Spark.
- Key features of V-Order:
  - Sorting: V-Order applies special sorting techniques to the Parquet files.
  - Row Group Distribution: It optimizes row group distribution.
  - Dictionary Encoding: Efficient dictionary encoding is used.
  - Compression: V-Order achieves better compression, leading to reduced storage costs.
- Benefits:
  - Lightning-Fast Reads: Power BI and SQL engines leverage Microsoft Verti-Scan technology and V-Ordered Parquet files, resulting in in-memory-like data access times.
  - Performance Boost: Even non-Verti-Scan compute engines (like Spark) benefit from V-Ordered files, with an average of 10% faster read times (up to 50% in some scenarios).
  - Cost Efficiency: V-Order reduces network, disk, and CPU resources during reads.
- Compatibility:
  - V-Order is 100% open-source Parquet format compliant, meaning all Parquet engines can read it as regular Parquet files.
  - It works seamlessly with Delta tables and features like Z-Order, compaction, vacuum, and time travel.
Z-Order:
Z-Order is another optimization technique, but it’s not specific to Fabric; it’s widely used in data lakes and analytics platforms. Z-Order aims to improve query performance by co-locating related information in the same set of files.
- How It Works:
  - Z-Order organizes data based on one or more columns (usually categorical or frequently filtered columns).
  - Rows with similar values in the specified columns are stored together.
  - This co-locality reduces the amount of data that needs to be read during queries.
- Benefits:
  - Data Skipping: By avoiding unnecessary reads, Z-Order significantly improves query efficiency.
  - Compatible with Delta Lake: Z-Order works seamlessly with Delta Lake.

Key Differences:

Timing: V-Order happens during write time, Z-Order during read time (or table optimization).
Purpose: V-Order focuses on compression and general read performance, Z-Order on co-locating data for specific queries.
Compatibility: V-Order is universally compatible, Z-Order might require specific tools like Delta Lake.

For more information please refer to these links:
https://www.linkedin.com/posts/lucazanna_data-microsoft-fabric-activity-7068093014677540864-hNm0/?or...
https://docs.delta.io/latest/optimizations-oss.html#language-sql
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparks...
https://www.dremio.com/blog/how-z-ordering-in-apache-iceberg-helps-improve-performance/
Data skipping for Delta Lake - Azure Databricks | Microsoft Learn

Hope this helps. Please let me know if you have any further questions. Glad to help.

View solution in original post

v-nikhilan-msft · ‎03-07-2024

Hi @sjpark
Thanks for using Fabric Community.

V-Order:
V-Order is a write-time optimization specifically designed for the Parquet file format within the Microsoft Fabric ecosystem. Its primary goal is to enhance read performance under various compute engines, including Power BI, SQL, and Spark.
- Key features of V-Order:
  - Sorting: V-Order applies special sorting techniques to the Parquet files.
  - Row Group Distribution: It optimizes row group distribution.
  - Dictionary Encoding: Efficient dictionary encoding is used.
  - Compression: V-Order achieves better compression, leading to reduced storage costs.
- Benefits:
  - Lightning-Fast Reads: Power BI and SQL engines leverage Microsoft Verti-Scan technology and V-Ordered Parquet files, resulting in in-memory-like data access times.
  - Performance Boost: Even non-Verti-Scan compute engines (like Spark) benefit from V-Ordered files, with an average of 10% faster read times (up to 50% in some scenarios).
  - Cost Efficiency: V-Order reduces network, disk, and CPU resources during reads.
- Compatibility:
  - V-Order is 100% open-source Parquet format compliant, meaning all Parquet engines can read it as regular Parquet files.
  - It works seamlessly with Delta tables and features like Z-Order, compaction, vacuum, and time travel.
Z-Order:
Z-Order is another optimization technique, but it’s not specific to Fabric; it’s widely used in data lakes and analytics platforms. Z-Order aims to improve query performance by co-locating related information in the same set of files.
- How It Works:
  - Z-Order organizes data based on one or more columns (usually categorical or frequently filtered columns).
  - Rows with similar values in the specified columns are stored together.
  - This co-locality reduces the amount of data that needs to be read during queries.
- Benefits:
  - Data Skipping: By avoiding unnecessary reads, Z-Order significantly improves query efficiency.
  - Compatible with Delta Lake: Z-Order works seamlessly with Delta Lake.

Key Differences:

Timing: V-Order happens during write time, Z-Order during read time (or table optimization).
Purpose: V-Order focuses on compression and general read performance, Z-Order on co-locating data for specific queries.
Compatibility: V-Order is universally compatible, Z-Order might require specific tools like Delta Lake.

For more information please refer to these links:
https://www.linkedin.com/posts/lucazanna_data-microsoft-fabric-activity-7068093014677540864-hNm0/?or...
https://docs.delta.io/latest/optimizations-oss.html#language-sql
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=sparks...
https://www.dremio.com/blog/how-z-ordering-in-apache-iceberg-helps-improve-performance/
Data skipping for Delta Lake - Azure Databricks | Microsoft Learn

Hope this helps. Please let me know if you have any further questions. Glad to help.

sjpark · ‎03-07-2024

Hello @v-nikhilan-msft

Thanks a lot for your quick and detail reply.

It helps a lot to me.

But i got a 1 more question.

Can i know how V-Order works? like special sorting techniques of V-Order or else.

If it's a confidential of MS, i won't ask any further.

Thanks.

Park.

v-nikhilan-msft · ‎03-07-2024

Hi @sjpark
I have shared all the publicly available information about V-Order's functionality.
Here's a quick recap: V-Order sorts data within Parquet files based on specific columns, strategically distributes rows across groups, and leverages dictionary encoding for compression. These techniques combined optimize Parquet files for faster reads and storage efficiency.

Hope this helps. Please let me know if you have any further questions.

sjpark · ‎03-07-2024

Thanks a lot @v-nikhilan-msft !

Your comment is very helpful to me.

Wish you a best luck.

v-nikhilan-msft · ‎03-07-2024

Hi @sjpark
Glad that your query got resolved. Please continue using Fabric Community for any help regarding your queries.
All the best for you too! Have a great day.

V-Order & Z-Order

Helpful resources

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024

Join us at the 2025 Microsoft Fabric Community Conference

V-Order & Z-Order

Helpful resources

Microsoft Fabric Community Conference 2025

A Year in Review - December 2024