Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for a $400 discount.
Register nowThe Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.
Hi everyone,
I am a little bit confused about the following behavior in Power BI Desktop.
I imported data from a 114 MB CSV-File using Import-Mode (in other terms I am using the VertiPaq-Enginge) which ends up in a 355 MB large data model in memory in Power BI Desktop.
I expected VertiPaq-Compression to reduce the data in size or at least keep it in the same size like in the data source. Power BI takes up more space than the raw CSV. I'm wondering what coul be causing this increase.
Here are some more technical details:
I have a CSV-File as datasource which includes 10 Million rows of unique SalesOrderNumbers (see first image below).
SalesOrderNumbers
Each row consists a 10 characters long SalesOrderNumber (meaning 10 Bytes Storage) plus a line break (additional 2 Bytes of Storage). So in total we have a 120.000.000 Byte of data plus a 16 Byte Column Header, resulting in 120.000.016 Byte large CSV-File which is round about 114 MB (see next image below).
After I imported this data into Power BI Desktop in a single One-Column-Table I saved my *.pbix-file, closed it and re-opened it (without any explicit refresh in PBI Desktop) and opened DAX Studio to do an investigation with VertiPaq-Analyzer.
The result shows that the total size of my "data model" is 355 MB in memory (see next two images)
My *.pbix-file has a total size of round about 111 MB (see very last image below).
Any insights would be greatly appreciated!
Thanks!
Hi @lbendlin
thanks for your answer. I guess there is something to that.
But in such case I am interested in what this produced overhead actually is?
In this article https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=3 about VertiPaq compression you can read in the Run Length Encoding Section a VertiPaq column never exceeds the original column in size. At worst it is the same size.
"Finally, you might have a column in which content changes so often that, if you try to compress it using RLE, you end up using more space than its original one. Think, for example, of the primary key of a table. It has a different value for each row, resulting in an RLE version larger than the column itself. In such a case, VertiPaq skips the RLE compression and stores the column as it is. Thus, the VertiPaq storage of a column will never exceed the original column size. At worst, it is the same."
For further investigation I produced another CSV-File with 10 Million rows but now with repeating values. The result is not as extreme as before but the model size in memory is still larger than in the original datasource.
Original CSV File Size: 123 MB
Model size in memory (VertiPaq-Analyzer): 172 MB (see screenschots below)
Size of *.pbix-file: 59 MB
In your screenshot you can see that the dictionary takes up the most space. Like a useless index.
RLE is not the only compression technique that Vertipaq attempts.
Thanks for your reply, @lbendlin!
But why VertiPaq still produces the Dictionary? If Storing the column itself is is even cheaper... Is this actually the way how it is implemented? If Value Encoding is not usable, take Hash Encoding what ever it comes out?
Great question - should probably be directed at the Italians or Jeffrey.
There are a few articles on that topic too Inside VertiPaq - Compress for success - Data Mozart
Is there a good way to do this? I am highly interest in it.
datasource which includes 10 Million rows of unique SalesOrderNumbers
That data is not compressible. Vertipaq tries and fails to compress it, and likely produces some overhead.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Power BI update to learn about new features.
User | Count |
---|---|
82 | |
78 | |
52 | |
39 | |
35 |
User | Count |
---|---|
94 | |
79 | |
51 | |
47 | |
47 |