Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi guys,
quick question: If I want to sum a subset of a column, for example the sum of the sales of only red products, which approach is better suited?
1.SUMX and FILTER
Red Sales 1 =
SUMX ( FILTER ( Sales; Sales[ProductColor] = "Red" ); Sales[Amount] )
or
2. CALCULATE and SUM
Red Sales 2 =
CALCULATE ( SUM ( Sales[Amount] ); Sales[ProductColor] = "Red" )
Thanks in advance.
Best regards
Lucas
Solved! Go to Solution.
You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:
=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )
=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )
=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)
In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.
@Anonymous - Thanks for inspiring me today.
Check out my latest article, CALCUHATE - Why I Don't Use DAX's CALCULATE Function.
You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:
=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )
=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )
=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)
In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.
Hi @AntrikshSharma , id like to ask how we would know that this is happening internally.
-----but in case of the second option Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" )-----
i know this much that Sales[ProductColor] = "Red" is a shorthand for FILTER(Sales, Sales[ProductColor] = "Red"). This much is given in the microsoft Learn documentation itself.
If we cant tell that the filter Sales[ProductColor] = "Red" converts to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) as you said, we may not be able to establish the computational superiority of CALCULATE function when compared to SUMX iteration function here in this case (it may be more convenient but let us not consider the ease of use)
kindly help.
Thanks in advance
You can identify the expansion using DAX Studio > Logical Query Plan
There is no difference.
@AntrikshSharma Thanks for the clarification. I was leaning towards this solution, too.
Could you do me another favor and answer a follow-up question to your answer?
The third example you posted seems to be a calculation based on the running total / cummulative total pattern:
=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)
I've used the pattern myself but I still get confused about the last part. Could you give me feedback If I've understood it correctly?
MAX ( Dates[Date] ) : We're in the row context now. There is only one Date which we retrieve with the MAX() function. But since there is only one date anyway in theory we could also use the MIN() or SELECTEDVALUE() functions to get this date
Thanks in advance.
Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same.
Running Total =
VAR MaxDateInFilterContext = MAX ( Dates[Date] )
VAR MaxYear = YEAR ( MaxDateInFilterContext )
VAR DatesLessThanMaxDate =
FILTER (
ALL ( Dates[Date], Dates[Calendar Year Number] ),
Dates[Date] <= MaxDateInFilterContext
&& Dates[Calendar Year Number] = MaxYear
)
VAR Result =
CALCULATE (
[Total Sales],
DatesLessThanMaxDate
)
RETURN
Result
take a look at the below screenshot. When you are at 2nd January, what is the MAX date? it is 2nd and the amount is calculated for all dates less than 2nd which is returned by the variable DatesLessThanMaxDate , next DAX is evaluated for 3rd January, now what is the MAX value? it is 3rd January, similarly it is being checked for each cell of the report.
Hence yes, you can use MIN or SELECTEDVALUE but their values will not be related to the Row context created by FILTER
Hello @AntrikshSharma,
if you can find the time I would appreciate help in explaining the concept you just described.
Especially I would like to know what you said about the MAX function which is wrapped in the FILTER function, why its not evaluated row by row, although its actually inside the filter function, which is a iterator?:
"Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same."
For me it is actually inside the Calculate function as well, because Calculate is outside of everything.
Also shouldn't Filter function works row by row? Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated? Looking The following seems to make sense but isn't going to work without aggregation:
FILTER ( ALL ( Dates ), Dates[Date] < ( Dates[Date] )
instead of:
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] )
Why I'm asking that is because when I do the same kind of calculation in a calculated column inside a table variable I can write like this without problem:
Thank you for your time.
Best regards.
For me it is actually inside the Calculate function as well, because Calculate is outside of everything.
That doesn't matter, CALCULATE prepares the new filter context by evaluating the filter argument like FILTER ( ... MAX )) in the original filter context that was active when CALCULATE was called. What is the original filter context here? the one created by the column in a report. Although Nested calculated may have different original filter context created by an outer CALCULATE.
Also shouldn't Filter function works row by row? -
FILTER does work row by row but when you use another function in the second argument of FILTER, then it doesn't depend on the row that is currently iterated unless that function is wrapped inside CALCULATE like CALCULATE ( MAX ) )
Read this: https://forum.enterprisedna.co/t/max-date-date-measure-used-in-the-filter-function/20123/5
Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated?
I know why you are confused, you are thinking that MAX depends on the row iterated by FILTER but there is no relation between MAX and the currently iterated row in FILTER, simply put MAX is an aggregator that is evaluated in the original Filter context that is outside of CALCULATE (in this case)
@AntrikshSharma thank you so much for your thorough explanation. The link you provided was of tremendous help. Keep up the good work!
Best.
Thanks again @AntrikshSharma ! I really appreciate that you took the time to write such a detailed explanation. This is a topic I was struggeling with for weeks.
Here is a video for those who need more explanation on this topic.
@Anonymous - I prefer the first approach with SUMX. The reason is because it works in cases where you have a table variable as well as a physical table. Thus, it is much more flexible and can be used in more situations than the second. I tend not to use CALCULATE at all because whatever you can do with CALCULATE can be done other ways.
Check out the September 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
25 | |
19 | |
18 | |
18 | |
17 |
User | Count |
---|---|
37 | |
25 | |
18 | |
17 | |
13 |