Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers!
Enter the sweepstakes now!Prepping for a Fabric certification exam? Join us for a live prep session with exam experts to learn how to pass the exam. Register now.
ive executed each notebooks with above query and in Fabric Capacity Metrics, the second query resumed less cu... and i do not understand why
[OPTIMIZE]
cu(s) : 24,000
duration(s) : 2,680
[OPTIMIZE + ZORDER]
cu(s) : 13,900
duration(s) : 1,550
anyone knows if it is normal? please help!
Thanks
Regards.
Solved! Go to Solution.
Hi @jwryu
Thanks for using Fabric Community.
The OPTIMIZE command in Delta Lake compacts small files into larger ones, which can help reduce the number of files and improve query performance. However, when you use OPTIMIZE with ZORDER BY column, it does more than just file compaction.
ZORDER is a technique that reorders the data based on the column specified in the ZORDER BY clause. This reordering of data can significantly improve the performance of queries that filter on the ZORDER column.
So, when you run OPTIMIZE table ZORDER BY column, it not only compacts the small files into larger ones but also reorders the data based on the column specified. As a result, it can reduce the amount of data that needs to be read, leading to less compute units (cu) being used and a shorter query duration.
In your case, the OPTIMIZE + ZORDER command used fewer compute units (13,900 cu) and took less time (1,550 seconds) compared to the OPTIMIZE command alone (24,000 cu and 2,680 seconds). This indicates that the ZORDER optimization was effective for your particular workload and data distribution.
I hope this helps! Let me know if you have any other questions.
Hi @jwryu
Thanks for using Fabric Community.
The OPTIMIZE command in Delta Lake compacts small files into larger ones, which can help reduce the number of files and improve query performance. However, when you use OPTIMIZE with ZORDER BY column, it does more than just file compaction.
ZORDER is a technique that reorders the data based on the column specified in the ZORDER BY clause. This reordering of data can significantly improve the performance of queries that filter on the ZORDER column.
So, when you run OPTIMIZE table ZORDER BY column, it not only compacts the small files into larger ones but also reorders the data based on the column specified. As a result, it can reduce the amount of data that needs to be read, leading to less compute units (cu) being used and a shorter query duration.
In your case, the OPTIMIZE + ZORDER command used fewer compute units (13,900 cu) and took less time (1,550 seconds) compared to the OPTIMIZE command alone (24,000 cu and 2,680 seconds). This indicates that the ZORDER optimization was effective for your particular workload and data distribution.
I hope this helps! Let me know if you have any other questions.
User | Count |
---|---|
35 | |
32 | |
18 | |
8 | |
6 |
User | Count |
---|---|
52 | |
48 | |
16 | |
13 | |
11 |