Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi all!
I'm new to power bi. I'm studying storage and optimization mechanisms.
The documentation says that it is better to convert the original text data to numeric values. For example SO123456 to 123456.
However, when I open vertipaq analyzer in dax studio, I see that CustomerKey takes up more space than CustomerAlternateKey
(AdventureWorks)
I can't understand why the number column takes up more space ?
It turns out that Microsoft's recommendations are not correct ???
Hi, @pavel118
The storage space taken by a column in VertiPaq is significantly influenced by its cardinality, which is the number of unique values in the column. If CustomerKey has a higher cardinality than CustomerAlternateKey, it could take up more space despite being numeric.
Optimizing High Cardinality Columns in VertiPaq - SQLBI
VertiPaq uses dictionary encoding to compress data. For numeric columns, if the range of numbers is large, the dictionary can become quite large. On the other hand, alphanumeric columns with fewer unique values can have smaller dictionaries. While numeric data types generally compress better, this is not always the case if the numeric values are not repetitive or if they span a wide range. Alphanumeric strings that are repetitive can be compressed more efficiently. You can see the folloing link:
Vertipaq optimization and its impact on columnstore compression - Simple Talk
In your case, it seems that CustomerKey might have a higher cardinality or a wider range of values compared to CustomerAlternateKey, leading to more storage space being used. Microsoft's recommendation to convert text to numeric values is generally valid, but specific cases like yours can show different results due to the factors mentioned above.
Based on your information, I create a sample table:
Then create a new calculated column:
CustomerKey = VALUE(MID([CustomerAlternateKey], 3, LEN([CustomerAlternateKey]) - 2))
In DAX Studio, I looked at their data size
In the example, the obvious Int type is smaller than the data size of the string type
How to Get Your Question Answered Quickly
Best Regards
Yongkang Hua
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@Anonymous Thanks for the answer!
But the cardinality of CustomerKey and CustomerAlternateKey is the same - 18,484. It is not clear why the dictionary of CustomerKey takes up more memory. By all logic it should be the other way around or am I missing something?
User | Count |
---|---|
98 | |
76 | |
76 | |
48 | |
26 |