Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.

Reply
pavel118
Frequent Visitor

int vs string in VertiPaq

Hi all!

I'm new to power bi. I'm studying storage and optimization mechanisms.

The documentation says that it is better to convert the original text data to numeric values. For example SO123456 to 123456.

 

 

https://learn.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction#optimize-column-d...

pavel118_0-1733158096569.png

However, when I open vertipaq analyzer in dax studio, I see that CustomerKey takes up more space than CustomerAlternateKey 

(AdventureWorks)

pavel118_1-1733158532715.png

 

pavel118_2-1733158695471.png

I can't understand why the number column takes up more space ?

It turns out that Microsoft's recommendations are not correct ???

 

2 REPLIES 2
Anonymous
Not applicable

Hi, @pavel118 

The storage space taken by a column in VertiPaq is significantly influenced by its cardinality, which is the number of unique values in the column. If CustomerKey has a higher cardinality than CustomerAlternateKey, it could take up more space despite being numeric. 

Optimizing High Cardinality Columns in VertiPaq - SQLBI

 

VertiPaq uses dictionary encoding to compress data. For numeric columns, if the range of numbers is large, the dictionary can become quite large. On the other hand, alphanumeric columns with fewer unique values can have smaller dictionaries. While numeric data types generally compress better, this is not always the case if the numeric values are not repetitive or if they span a wide range. Alphanumeric strings that are repetitive can be compressed more efficiently. You can see the folloing link:

Vertipaq optimization and its impact on columnstore compression - Simple Talk

 

In your case, it seems that CustomerKey might have a higher cardinality or a wider range of values compared to CustomerAlternateKey, leading to more storage space being used. Microsoft's recommendation to convert text to numeric values is generally valid, but specific cases like yours can show different results due to the factors mentioned above.

Based on your information, I create a sample table:

vyohuamsft_0-1733196242134.png

Then create a new calculated column:

CustomerKey = VALUE(MID([CustomerAlternateKey], 3, LEN([CustomerAlternateKey]) - 2))

vyohuamsft_1-1733196289793.png

In DAX Studio, I looked at their data size

vyohuamsft_2-1733196447759.png

 

In the example, the obvious Int type is smaller than the data size of the string type

 

How to Get Your Question Answered Quickly

Best Regards

Yongkang Hua

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

@Anonymous Thanks for the answer!

But the cardinality of CustomerKey and CustomerAlternateKey is the same - 18,484. It is not clear why the dictionary of CustomerKey takes up more memory. By all logic it should be the other way around or am I missing something?

 

 

pavel118_0-1733211929543.png

 

Helpful resources

Announcements
September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Kudoed Authors