This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreLevel up your Power BI skills this month - build one visual each week and tell better stories with data! Get started
Hi!
I am loading huge dataset and trying to optimize in-memory size of it.
Total row count: 310 mln rows, sorted by gtin_prod_key and pbi_site_key, see details below.
I have noticed that my heaviest columns are twice heavier when I load full data set with fact columns compared with when I exclude fact columns, see the screenshots.
All other variables: data itself, cardinality, total number of rows, encoding are the same except I add Select Column statement to exclude columns in one of the queries.
Moreover, I though encoding was the most important in column size, but I tried both hash and value, and the size does not seem to change.
So I have 2 questions:
1. How columns size can be affected by other columns given all other variables are the same?
2. Why my time_period_end_date columns takes so much with such low cardinality even if I convert it to int?
3. Why encoding does not significantly affect column sizes?
With fact columns with value encoding
With fact columns with hash encoding
Without fact columns with value encoding
Without fact columns with has encoding
Sample data
Solved! Go to Solution.
Hi @slava_aptown ,
To put it simply, when I refer to using a surrogate DateID, I mean assigning a small numeric key to each unique date rather than storing the full date value in your large fact table. In other words, you number your distinct dates from 1 to N and keep this list in a separate Date table. The fact table then only stores the compact DateID.
For example
| DateID | Actual Date Value |
|---|---|
| 1 | 45292 |
| 2 | 45293 |
| 3 | 45294 |
VertiPaq is able to compress small, densely packed numbers much more effectively than a large integer range like 45,000–46,000. With a dataset of 300 million rows, this can significantly impact memory usage.
You’ll still need to join to the Date table to access actual dates or calendar details, so the functionality remains the same it’s simply a more efficient use of storage.
Thanks for your response.
Hi @slava_aptown ,
If you haven’t had a chance to review my response yet, please do so and let me know if you need any additional details.
Thank you.
Hi @slava_aptown ,
Everything should be clear now. If you need any additional information, please let me know.
Thank you.
Hi @slava_aptown ,
Thank you for reaching out to the Microsoft Fabric Community. When you load only the key columns, VertiPaq compresses them efficiently. Adding fact columns causes the engine to create new internal segments, which reduces compression and means the key columns use more memory, even if their values stay the same.
To optimize size, you can
1. Sort the data so that repeating values are grouped together.
2. Reduce or round numeric fact columns since they affect compression the most.
3. Use a surrogate DateID instead of raw date integers.
4. Remove columns not needed for reporting.
These are common ways to manage model size for large fact tables.
Regards,
Yugandhar.
Thanks for explanation.
What do you mean by "Use a surrogate DateID instead of raw date integers."?
How should this surrogate DateID look like? Do you mean just enumeration of all dates I have from 1 to some int?
Hi @slava_aptown ,
To put it simply, when I refer to using a surrogate DateID, I mean assigning a small numeric key to each unique date rather than storing the full date value in your large fact table. In other words, you number your distinct dates from 1 to N and keep this list in a separate Date table. The fact table then only stores the compact DateID.
For example
| DateID | Actual Date Value |
|---|---|
| 1 | 45292 |
| 2 | 45293 |
| 3 | 45294 |
VertiPaq is able to compress small, densely packed numbers much more effectively than a large integer range like 45,000–46,000. With a dataset of 300 million rows, this can significantly impact memory usage.
You’ll still need to join to the Date table to access actual dates or calendar details, so the functionality remains the same it’s simply a more efficient use of storage.
Thanks for your response.
Check out the April 2026 Power BI update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
| User | Count |
|---|---|
| 31 | |
| 25 | |
| 21 | |
| 18 | |
| 17 |
| User | Count |
|---|---|
| 61 | |
| 35 | |
| 33 | |
| 23 | |
| 23 |