Solved: Re: Sum vs DisctintCount - why Sum is faster?

jaryszek

Hello,

from vertipaq point of view:

Sum vs DisctintCount - why Sum is faster?

Can anybody explain?
Best,
Jacek

AnalyticPulse

sum is much faster than distinctcount in dax because the vertipaq engine treats the two operations very differently. when you do a sum, vertipaq only needs to scan through the column and add the numbers together. it doesn’t need to check anything about uniqueness or track previous values. the column is already compressed and organised in segments, so summing is almost a straight-line operation that the storage engine can do extremely fast.

distinctcount is a lot heavier. to get the number of unique values, vertipaq must build an internal dictionary or hash-set to track which values it has already seen. every new value must be compared against that set to see if it is new or a duplicate. that means more cpu work, more memory usage, and more formula engine activity, which is always slower. high-cardinality columns make it even slower, because more unique values means a bigger dictionary to maintain.

so the simple explanation is that sum is easy, while distinctcount is hard. sum just adds numbers, but distinctcount must constantly check, store, and compare values to build a list of uniques. this extra work is why distinctcount is almost always slower than sum in vertipaq.

Analytic Pulse Blog
Docynx Productivity Tools
Tool to Generate Realistic Sample Data Instantly online

View solution in original post

AnalyticPulse

hi @jaryszek

sum is much faster than distinctcount in dax because the vertipaq engine treats the two operations very differently. when you do a sum, vertipaq only needs to scan through the column and add the numbers together. it doesn’t need to check anything about uniqueness or track previous values. the column is already compressed and organised in segments, so summing is almost a straight-line operation that the storage engine can do extremely fast.

distinctcount is a lot heavier. to get the number of unique values, vertipaq must build an internal dictionary or hash-set to track which values it has already seen. every new value must be compared against that set to see if it is new or a duplicate. that means more cpu work, more memory usage, and more formula engine activity, which is always slower. high-cardinality columns make it even slower, because more unique values means a bigger dictionary to maintain.

so the simple explanation is that sum is easy, while distinctcount is hard. sum just adds numbers, but distinctcount must constantly check, store, and compare values to build a list of uniques. this extra work is why distinctcount is almost always slower than sum in vertipaq.

Analytic Pulse Blog
Docynx Productivity Tools
Tool to Generate Realistic Sample Data Instantly online