Skip to main content
cancel
Showing results for
Search instead for
Did you mean:

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Anonymous
Not applicable

## SUMX & FILTER or SUM & CALCULATE: Best practice for summing subset of columns

Hi guys,

quick question: If I want to sum a subset of a column, for example the sum of the sales of only red products, which approach is better suited?

1.SUMX and FILTER

`Red Sales 1 =SUMX ( FILTER ( Sales; Sales[ProductColor] = "Red" ); Sales[Amount] )`

or

2. CALCULATE and SUM

`Red Sales 2 =CALCULATE ( SUM ( Sales[Amount] ); Sales[ProductColor] = "Red" )`

Thanks in advance.

Best regards

Lucas

1 ACCEPTED SOLUTION
Community Champion

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

``````=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )``````
``````=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )``````
``````=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)``````

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

13 REPLIES 13
Super User

@Anonymous - Thanks for inspiring me today.

Check out my latest article, CALCUHATE - Why I Don't Use DAX's CALCULATE Function.

https://community.powerbi.com/t5/Community-Blog/CALCUHATE-Why-I-Don-t-Use-DAX-s-CALCULATE-Function/ba-p/1248635

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Community Champion

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

``````=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )``````
``````=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )``````
``````=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)``````

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

New Member

Hi @AntrikshSharma , id like to ask how we would know that this is happening internally.

-----but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" )-----

i know this much that Sales[ProductColor] = "Red" is a shorthand for FILTER(Sales, Sales[ProductColor] = "Red"). This much is given in the microsoft Learn documentation itself.

If we cant tell that the filter  Sales[ProductColor] = "Red" converts to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) as you said, we may not be able to establish the computational superiority of CALCULATE function when compared to SUMX iteration function here in this case (it may be more convenient but let us not consider the ease of use)

kindly help.

Thanks in advance

Community Champion

You can identify the expansion using DAX Studio > Logical Query Plan

There is no difference.

Anonymous
Not applicable

@AntrikshSharma Thanks for the clarification. I was leaning towards this solution, too.

Could you do me another favor and answer a follow-up question to your answer?

The third example you posted seems to be a calculation based on the running total / cummulative total pattern:

`=CALCULATE ([Sales Amount],Customer[Gender] = "Male",Products[Color] IN { "Green", "Yellow", "White" },USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) ))`

I've used the pattern myself but I still get confused about the last part. Could you give me feedback If I've understood it correctly?

• Dates[Date] : refers to the column
• MAX ( Dates[Date] ) : We're in the row context now. There is only one Date which we retrieve with the MAX() function. But since there is only one date anyway in theory we could also use the MIN() or SELECTEDVALUE() functions to get this date

Thanks in advance.

Community Champion

Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same.

``````Running Total =
VAR MaxDateInFilterContext = MAX ( Dates[Date] )
VAR MaxYear = YEAR ( MaxDateInFilterContext )
VAR DatesLessThanMaxDate =
FILTER (
ALL ( Dates[Date], Dates[Calendar Year Number] ),
Dates[Date] <= MaxDateInFilterContext
&& Dates[Calendar Year Number] = MaxYear
)
VAR Result =
CALCULATE (
[Total Sales],
DatesLessThanMaxDate
)
RETURN
Result``````

take a look at the below screenshot. When you are at 2nd January, what is the MAX date? it is 2nd and the amount is calculated for all dates less than 2nd which is returned by the variable DatesLessThanMaxDate , next DAX is evaluated for 3rd January, now what is the MAX value? it is 3rd January, similarly it is being checked for each cell of the report.

Hence yes, you can use MIN or SELECTEDVALUE but their values will not be related to the Row context created by FILTER

Impactful Individual

Hello @AntrikshSharma

if you can find the time I would appreciate help in explaining the concept you just described.

Especially I would like to know what you said about the MAX function which is wrapped in the FILTER function, why its not evaluated row by row, although its actually inside the filter function, which is a iterator?:

"Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same."

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.

Also shouldn't Filter function works row by row?  Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated? Looking The following seems to make sense but isn't going to work without aggregation:

`FILTER ( ALL ( Dates ), Dates[Date] < ( Dates[Date] ) `

instead of:

`FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) `

Why I'm asking that is because when I do the same kind of calculation in a calculated column inside a table variable I can write like this without problem:

Column=
Var _TableVariable= Filter( 'Calendar', 'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate] && 'Calendar[Day]= "Monday"))
return
countrows(_TableVariable)

Here I don't need to aggregate the dates, it knows automatically when I say
'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate]
that I'm talking about ALL the Calendar Dates which are bigger than the booking dates and smaller than then the FinishDate when the specific fow in the calendar table got evaluated row by row.

Thank you for your time.

Best regards.

Community Champion

@Applicable88

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.

That doesn't matter, CALCULATE prepares the new filter context by evaluating the filter argument like FILTER ( ... MAX )) in the original filter context that was active when CALCULATE was called. What is the original filter context here? the one created by the column in a report. Although Nested calculated may have different original filter context created by an outer CALCULATE.

Also shouldn't Filter function works row by row? -

FILTER does work row by row but when you use another function in the second argument of FILTER, then it doesn't depend on the row that is currently iterated unless that function is wrapped inside CALCULATE like CALCULATE ( MAX ) )

Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated?

I know why you are confused, you are thinking that MAX depends on the row iterated by FILTER but there is no relation between MAX and the currently iterated row in FILTER, simply put MAX is an aggregator that is evaluated in the original Filter context that is outside of CALCULATE (in this case)

Impactful Individual

@AntrikshSharma thank you so much for your thorough explanation. The link you provided was of tremendous help.  Keep up the good work!

Best.

Anonymous
Not applicable

Thanks again @AntrikshSharma ! I really appreciate that you took the time to write such a detailed explanation. This is a topic I was struggeling with for weeks.

Community Champion
You're welcome, feel free to ask more questions if something isn't clear. Have a great day!
Community Champion

Here is a video for those who need more explanation on this topic.

Super User

@Anonymous - I prefer the first approach with SUMX. The reason is because it works in cases where you have a table variable as well as a physical table. Thus, it is much more flexible and can be used in more situations than the second. I tend not to use CALCULATE at all because whatever you can do with CALCULATE can be done other ways.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

## Helpful resources

Announcements

#### Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

#### Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

#### Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

#### New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Top Solution Authors
Top Kudoed Authors