Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
Anonymous
Not applicable

SUMX & FILTER or SUM & CALCULATE: Best practice for summing subset of columns

Hi guys,

 

quick question: If I want to sum a subset of a column, for example the sum of the sales of only red products, which approach is better suited?

 

1.SUMX and FILTER

Red Sales 1 =
SUMX ( FILTER ( Sales; Sales[ProductColor] = "Red" ); Sales[Amount] )


or

 

2. CALCULATE and SUM

Red Sales 2 =
CALCULATE ( SUM ( Sales[Amount] ); Sales[ProductColor] = "Red" )

 

Thanks in advance.

Best regards

Lucas

 

1 ACCEPTED SOLUTION
AntrikshSharma
Super User
Super User

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

 

=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )
=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )
=
CALCULATE (
    [Sales Amount],
    Customer[Gender] = "Male",
    Products[Color] IN { "Green", "Yellow", "White" },
    USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
    FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)

 

 

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

 

View solution in original post

13 REPLIES 13
Greg_Deckler
Super User
Super User

@Anonymous - Thanks for inspiring me today. 

Check out my latest article, CALCUHATE - Why I Don't Use DAX's CALCULATE Function.

https://community.powerbi.com/t5/Community-Blog/CALCUHATE-Why-I-Don-t-Use-DAX-s-CALCULATE-Function/ba-p/1248635

 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...
AntrikshSharma
Super User
Super User

You should use 2 second version as CALCULATE evaluates its filter arguments first and once the filter context is modified it evaluates the first argument. And modifying filter context before evaluating something is efficient and a good practice. Also you get to reuse your base measure in all your other measure, for example if you create a Sales Amount measure like SUM ( Sales[Amount] ) you can use it multiple time in other measure where you just want to modify the context in which the SUM is to be evaluated without rewriting the whole code again. example:

 

=
CALCULATE ( [Sales Amount], Dates[Year] = 2020, Customer[Continent] = "Asia" )
=
CALCULATE ( [Sales Amount], Dates[Year] IN { 2020, 2019, 2018 } )
=
CALCULATE (
    [Sales Amount],
    Customer[Gender] = "Male",
    Products[Color] IN { "Green", "Yellow", "White" },
    USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
    FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)

 

 

In case of the first option FILTER has to iterate probably a very huge table and check for rows where color is red, but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) since it use ALL and ALL returns unique product colors, that list is going to be really small, we are talking in hundreds or maybe tens, and iterating such a small list is much more efficient that iterating a table with millions or billions of rows to just check for red products and then do further iterations with SUMX.

 

Hi @AntrikshSharma , id like to ask how we would know that this is happening internally. 

 -----but in case of the second option  Sales[ProductColor] = "Red" internally it expands to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" )-----

i know this much that Sales[ProductColor] = "Red" is a shorthand for FILTER(Sales, Sales[ProductColor] = "Red"). This much is given in the microsoft Learn documentation itself.

If we cant tell that the filter  Sales[ProductColor] = "Red" converts to FILTER ( ALL ( Sales[ProductColor] ), Sales[ProductColor] = "Red" ) as you said, we may not be able to establish the computational superiority of CALCULATE function when compared to SUMX iteration function here in this case (it may be more convenient but let us not consider the ease of use)

kindly help.

Thanks in advance

You can identify the expansion using DAX Studio > Logical Query Plan

AntrikshSharma_3-1702398752864.png

There is no difference.

AntrikshSharma_4-1702398760128.png

 

Anonymous
Not applicable

@AntrikshSharma Thanks for the clarification. I was leaning towards this solution, too.

Could you do me another favor and answer a follow-up question to your answer?

The third example you posted seems to be a calculation based on the running total / cummulative total pattern:

=
CALCULATE (
[Sales Amount],
Customer[Gender] = "Male",
Products[Color] IN { "Green", "Yellow", "White" },
USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )
)

 I've used the pattern myself but I still get confused about the last part. Could you give me feedback If I've understood it correctly?

  • Dates[Date] : refers to the column
  • MAX ( Dates[Date] ) : We're in the row context now. There is only one Date which we retrieve with the MAX() function. But since there is only one date anyway in theory we could also use the MIN() or SELECTEDVALUE() functions to get this date

Thanks in advance.

Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same.

Running Total = 
VAR MaxDateInFilterContext = MAX ( Dates[Date] )
VAR MaxYear = YEAR ( MaxDateInFilterContext )
VAR DatesLessThanMaxDate =
    FILTER (
        ALL ( Dates[Date], Dates[Calendar Year Number] ),
        Dates[Date] <= MaxDateInFilterContext
            && Dates[Calendar Year Number] = MaxYear
    )
VAR Result =
    CALCULATE (
        [Total Sales],
        DatesLessThanMaxDate
    )
RETURN
    Result

 take a look at the below screenshot. When you are at 2nd January, what is the MAX date? it is 2nd and the amount is calculated for all dates less than 2nd which is returned by the variable DatesLessThanMaxDate , next DAX is evaluated for 3rd January, now what is the MAX value? it is 3rd January, similarly it is being checked for each cell of the report.

1.PNG

Hence yes, you can use MIN or SELECTEDVALUE but their values will not be related to the Row context created by FILTER

Hello @AntrikshSharma

if you can find the time I would appreciate help in explaining the concept you just described.

 

Especially I would like to know what you said about the MAX function which is wrapped in the FILTER function, why its not evaluated row by row, although its actually inside the filter function, which is a iterator?: 

"Actually MAX here is not related to the Row context created by FILTER, MAX is being evaluated in the filter context that is outside CALCULATE, this is well understood by using Variables, here is arunning total patern using the same."

 

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.

Also shouldn't Filter function works row by row?  Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated? Looking The following seems to make sense but isn't going to work without aggregation:

FILTER ( ALL ( Dates ), Dates[Date] < ( Dates[Date] ) 

instead of:

FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) 

 Why I'm asking that is because when I do the same kind of calculation in a calculated column inside a table variable I can write like this without problem:

 

Column= 
Var _TableVariable= Filter( 'Calendar', 'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate] && 'Calendar[Day]= "Monday"))
return
countrows(_TableVariable) 
 
Here I don't need to aggregate the dates, it knows automatically when I say
  'Calendar' [Date]>='Maintable'[BookingDate] && 'Calendar' [Date]<='Maintable'[FinishDate]
that I'm talking about ALL the Calendar Dates which are bigger than the booking dates and smaller than then the FinishDate when the specific fow in the calendar table got evaluated row by row. 
 

 

Thank you for your time.

Best regards. 

@Applicable88 

For me it is actually inside the Calculate function as well, because Calculate is outside of everything.


That doesn't matter, CALCULATE prepares the new filter context by evaluating the filter argument like FILTER ( ... MAX )) in the original filter context that was active when CALCULATE was called. What is the original filter context here? the one created by the column in a report. Although Nested calculated may have different original filter context created by an outer CALCULATE.

 

Also shouldn't Filter function works row by row? -

 

FILTER does work row by row but when you use another function in the second argument of FILTER, then it doesn't depend on the row that is currently iterated unless that function is wrapped inside CALCULATE like CALCULATE ( MAX ) )

 

Read this: https://forum.enterprisedna.co/t/max-date-date-measure-used-in-the-filter-function/20123/5

 

Shouldn't MAX be the Max date from the whole table its is refering to and not only the Max Date until that row which is going to be evaluated?

 

I know why you are confused, you are thinking that MAX depends on the row iterated by FILTER but there is no relation between MAX and the currently iterated row in FILTER, simply put MAX is an aggregator that is evaluated in the original Filter context that is outside of CALCULATE (in this case)

@AntrikshSharma thank you so much for your thorough explanation. The link you provided was of tremendous help.  Keep up the good work!

Best. 

Anonymous
Not applicable

Thanks again @AntrikshSharma ! I really appreciate that you took the time to write such a detailed explanation. This is a topic I was struggeling with for weeks. 

You're welcome, feel free to ask more questions if something isn't clear. Have a great day!

 

Here is a video for those who need more explanation on this topic.

Greg_Deckler
Super User
Super User

@Anonymous - I prefer the first approach with SUMX. The reason is because it works in cases where you have a table variable as well as a physical table. Thus, it is much more flexible and can be used in more situations than the second. I tend not to use CALCULATE at all because whatever you can do with CALCULATE can be done other ways.



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.

Top Solution Authors