Solved: Re: CALCULATE Challenge - Round 1

Greg_Deckler · ‎04-21-2024

For those haters of No CALCULATE like @AlexisOlson, 😅, I challenge you to create measures with an explicit CALCULATE that for the prescribed scenarios perform anywhere close to the same speed as the very simple No CALCULATE measures that perform date intelligence calculations. Here are the rules:

Must contain an explicit CALCULATE
The explicit CALCULATE must not be superfluous, i.e. CALCULATE(SUM('Table'[Value])) if using SUM('Table'[Value]) would do just fine.

A sample No CALCULATE measure is:

NC Intenet Sales (YTD) = 
    VAR __Date = MAX('Dates'[Date])
    VAR __Year = YEAR(__Date)
    VAR __Table = 
        SUMMARIZE(
            ALL('FactInternetSales'), 
            'FactInternetSales'[OrderDate],
            "__Year", YEAR([OrderDate]),
            "__Sales", SUM('FactInternetSales'[SalesAmount]))
    VAR __Result = SUMX(FILTER(__Table, [__Year] = __Year && [OrderDate] <= __Date),[__Sales])
RETURN
    __Result

The PBIX file is attached below. There are 16 measures for calculating such things as previous year to date, previous month to date, previous week to date, etc. Here is a reference video that also describes the scenario and original performance results:

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

AlexisOlson · ‎04-22-2024

I think this should qualify, though it isn't super clean.

VAR _AllDates_ =
    SUMMARIZE (
    	ALL ( FactInternetSales ),
    	Dates[Year],
    	Dates[Date]
    )
VAR _DateRange_ =
    WINDOW (
        1, ABS,
        0, REL,
        _AllDates_,
        ORDERBY ( Dates[Date] ),
        PARTITIONBY ( Dates[Year] )
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

View solution in original post

AlexisOlson · ‎04-23-2024

thank God it's such an obvious and easy solution that even a DAX novice could understand and doesn't rely on a function introduced within that last year which pretty much means the solution was unsolvable for 7 years or so...

WINDOW isn't required. A plain filter works fine too and has been around since the beginning.

VAR _AllDates_ = SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date]	)
	
VAR _CurrDate = MAX ( Dates[Date] )
VAR _CurrYear = YEAR ( _CurrDate )

VAR _DateRange_ =
    FILTER (
    	_ALLDates_,
    	YEAR ( Dates[Date] )  = _CurrYear &&
    	Dates[Date] <= _CurrDate
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

However, there are still 15 other date intelligence measures in the file that are part of the challenge.

I'm not really interested in doing all 15 other ones as I don't think they'll reveal much that this YTD example doesn't already. If there is one that you think is meaningfully different, let me know.

View solution in original post

AlexisOlson

@Greg_Deckler, PW is an easy case where the TI solution or something like the following is just as fast.

CALCULATE (
    [Internet Sales],
    TREATAS ( VALUES ( Dates[Prior Week Date] ), Dates[Date] )
)

View solution in original post

dotykier

Late to the party, and sorry if this has been posted before (maybe I'm just stupid, but for the life of me, I can't figure out how to browse through forum thread comments in the order they were posted...)

It seems to me that the NO CALCULATE performs better than TOTALYTD, because @Greg_Deckler's dataset was carefully crafted for this particular scenario. Specifically, the 'Dates' table is much, much larger than it needs to be (containing dates from 2010 to 2030, even though the fact table only contains data for 2010-2014), while the fact table is a meager 60.000 rows.

If we reduce the 'Dates' table to only contain dates from 2010-2014, TOTALYTD performs much better than in the original model, but NO CALCULATE is still faster by about 50%:

However, TOTALYTD scales better with the size of the fact table. For example, on a model with ~17 mio. rows in the fact table (called 'Orders' in this example), the two approaches have identical performance:

I haven't tested it, but I would wager that TOTALYTD outperforms NO CALCULATE if you increase the size of the fact table further. This is because TOTALYTD doesn't need to consider the fact table when creating the date filter.

But more importantly, using TOTALYTD (or an equivalent CALCULATE) addresses a serious issue in the NO CALCULATE approach, namely that all filters are removed from FactInternetSales. This means that you can't slice/filter the fact table by anything other than dates. For example, try to add a slicer on the FactInternetSales[ProductKey] column, and see what happens as you filter different product keys... I would like to see a NO CALCULATE solution that retains any other filter that could potentially affect the fact table, while still performing as good as TOTALYTD. That is the real challenge imho.

Greg_Deckler

@dotykier I can assure you that the semantic model wasn't specifically engineered for any particular purpose. The dates table is reasonable in that many corporate date tables would want a decade of history and also be future proof by about a decade. That's pretty standard. I mean, it's only ~7,300 rows.

The fact table is simply AdventureWorks which is also pretty standard for demonstration purposes. The fact that the table is a mere 60,000 rows means that TI and CALCULATE should have a super easy time with this semantic model. If they can't perform on a semantic model that includes 70,000 rows total between the fact and dimension tables how can we expect them to perform on a data model with 10x or 100x that number of rows? That doesn't make a lot of sense.

All filters do not have to be removed, there are trivial ALLEXCEPT methods for No CALCULATE: https://youtu.be/fH0VcV9Smow

Also, the scenario you are describing was kind of the focus for @AlexisOlson's No CALCULATE Challenge #2 and it was solved by @tamerj1

With the No CALCULATE Challenges, we didn't get to whine about the scenario and count that as a solution so the same applies in this case. You can whine and complain however you want, but that isn't a solution.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

AlexisOlson · ‎04-23-2024

The other interesting thing about this approach is that you are pretty much destroying the star schema or at least not leveraging it in the slightest. You effectively create a new table that includes all rows in the fact table summarized by Date and then create a filter clause over that. So this kind of says to me that the mantra of the star schema coupled with CALCULATE as being the silver bullet for all things DAX is effectively nonsense because you basically have to blow-up the star schema to get CALCULATE to function at any reasonable performance level.

Finally, I don't think it's intuitive that the SUMMARIZEd table includes all dates in it considering the filter context within a row within the table is to a specific date. That seems weird to me although I imagine it is because using SUMMARIZE against ALL of the fact table and then by Date brings all of those dates back into context somehow versus the individual date at the visual row.

The SUMMARIZE works on the expanded fact table, which leverages the star schema. I find using SUMMARIZE this way useful in many situations.

This situation where you have a visual with thousands of rows does indeed seem to be a pathological case for basic CALCULATE patterns. There are other situations where those same patterns will outperform the No CALCULATE approach in the post (see the Contoso example I uploaded, for example). I'd expect Tamer's WINDOW solution to work well broadly.

lbendlin · ‎04-26-2024

The other interesting thing about this approach is that you are pretty much destroying the star schema or at least not leveraging it in the slightest.

I want to come back to this point as it is really important. SUMMARIZE uses the data model, whereas SUMMARIZECOLUMNS does not. And all the window functions explicitly ignore the data model for a myopic view on a pre-sorted virtual table. Similar for the Visual Calculations.

So the questions becomes more - do you want the data model to do the work, albeit maybe a bit slower, or do you push it aside and hand craft a high performance/ high maintenance alternative?

tamerj1 · ‎04-26-2024

@lbendlin

I agree. All window functions ignore the data model. But I didn't get the point of SUMMARIZECOLUMNS. Would you please elaborate on that?

lbendlin · ‎04-26-2024

@tamerj1 Here's a very nice write-up All the secrets of SUMMARIZE - SQLBI

tamerj1 · ‎04-26-2024

Thank you @lbendlin

That is one of my most favorite articles which I keep referring to every other while. However, my question was about SUMMARIZECOLUMNS and how it ignores the data model. I would really appreciate sharing some resources on this regard as it is really difficult to find detailed and insightful materials about some DAX functions which I admitt many of which I still don't fully understand.

lbendlin · ‎04-26-2024

here's another one Introducing SUMMARIZECOLUMNS - SQLBI

tamerj1 · ‎04-26-2024

Thank you!

lbendlin · ‎04-23-2024

Don't forget that the star schema is mostly smoke and mirrors, and that behind the scenes everything is based on expanded tables anyway.

AlexisOlson · ‎04-23-2024

thank God it's such an obvious and easy solution that even a DAX novice could understand and doesn't rely on a function introduced within that last year which pretty much means the solution was unsolvable for 7 years or so...

WINDOW isn't required. A plain filter works fine too and has been around since the beginning.

VAR _AllDates_ = SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date]	)
	
VAR _CurrDate = MAX ( Dates[Date] )
VAR _CurrYear = YEAR ( _CurrDate )

VAR _DateRange_ =
    FILTER (
    	_ALLDates_,
    	YEAR ( Dates[Date] )  = _CurrYear &&
    	Dates[Date] <= _CurrDate
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

However, there are still 15 other date intelligence measures in the file that are part of the challenge.

I'm not really interested in doing all 15 other ones as I don't think they'll reveal much that this YTD example doesn't already. If there is one that you think is meaningfully different, let me know.

Greg_Deckler · ‎04-28-2024

@AlexisOlson I have been working through each one. Currently I am having issues with Previous Week. I have attached my version of your code technique for previous week. The code below is an order of magnitude worse than the No CALCULATE version so can't accept it as a solution to the problem. Yes, it is orders of magnitude faster than the other No CALCULATE approaches but it's still almost a full second for the DAX query versus 20 ms. My testing PBIX is attached below signature.

Alexis Olsen (PW) = 

VAR _AllDates_ = SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date]	)

VAR _MinDate = MIN( 'Dates'[Prior Week Date] )
VAR _MaxDate = MAX( 'Dates'[Prior Week Date] )

VAR _DateRange_ =
    FILTER (
    	_ALLDates_,
    	'Dates'[Date] >= _MinDate &&
    	Dates[Date] <= _MaxDate
    )
VAR _Result =  CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
    _Result

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

AlexisOlson

@Greg_Deckler, PW is an easy case where the TI solution or something like the following is just as fast.

CALCULATE (
    [Internet Sales],
    TREATAS ( VALUES ( Dates[Prior Week Date] ), Dates[Date] )
)

Daryl-Lynch-Bzy

@AlexisOlson & @Greg_Deckler - I see this works faster, but prior year calculation don't solve the Leap Year issue.

Greg_Deckler

@Daryl-Lynch-Bzy I might be wrong, but I feel like the leap year issue is a problem for all approaches and people have different ways that they choose to solve/approach that problem. Good catch but I really wasn't focused on leap year stuff.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Greg_Deckler

@AlexisOlson Very nice!! I am marking your answers as solutions. Also, I have attached my testing file to the original message that contains all of the measures so that other people can easily test as well.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Greg_Deckler · ‎04-23-2024

@AlexisOlson A bit worse, a little over 50% without WINDOW but still within acceptable limits. I'll take the technique and apply it to the other 15 calculations and let you know if there is an issue. The interesting thing is that if you use a more traditional CALCULATE approach like below, the timing increases to nearly twice that of No CALCULATE:

Alexis Olson TYD CALCULATE 3 = 
    VAR _CurrDate = MAX ( Dates[Date] )
    VAR _CurrYear = YEAR ( _CurrDate )
    VAR _Result =  
        CALCULATE ( 
            [Internet Sales], 
            FILTER (
                SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date]	),
                YEAR ( Dates[Date] )  = _CurrYear &&
                Dates[Date] <= _CurrDate
            )
        )
RETURN
    _Result

The other interesting thing about this approach is that you are pretty much destroying the star schema or at least not leveraging it in the slightest. You effectively create a new table that includes all rows in the fact table summarized by Date and then create a filter clause over that. So this kind of says to me that the mantra of the star schema coupled with CALCULATE as being the silver bullet for all things DAX is effectively nonsense because you basically have to blow-up the star schema to get CALCULATE to function at any reasonable performance level.

Finally, I don't think it's intuitive that the SUMMARIZEd table includes all dates in it considering the filter context within a row within the table is to a specific date. That seems weird to me although I imagine it is because using SUMMARIZE against ALL of the fact table and then by Date brings all of those dates back into context somehow versus the individual date at the visual row.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

tamerj1 · ‎04-23-2024

@Greg_Deckler , @AlexisOlson

I would say that both Alex's solutions (WINDOW and SUMMARIZE) look more like a NoCALCULATE approach.
The only difference is replacing

SUMX ( __TableVar, 'Table'[Column] )

with

CALCULATE ( SUM ( 'Table'[Column] ), __TableVar )

in fact once filter ALL ( 'Table' ) is used, CALCULATE remains there with absolutely no meaning of existence.

Greg_Deckler · ‎04-23-2024

@tamerj1 Wow, hmm, there was that second rule. This formula (below) produces the same result as @AlexisOlson's orignal which technically makes the CALCULATE superfluous. That said, the CALCULATE does not JUST wrap a function that would work without it so I think that going strictly by the rules specified Alexis' formula technically meets the specified criteria.

Alexis Olson TYD CALCULATE 4 = 
VAR _AllDates_ = SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date]	)
	
VAR _CurrDate = MAX ( Dates[Date] )
VAR _CurrYear = YEAR ( _CurrDate )

VAR _DateRange_ =
    FILTER (
    	_ALLDates_,
    	YEAR ( Dates[Date] )  = _CurrYear &&
    	Dates[Date] <= _CurrDate
    )
VAR _Result =  SUMX ( _DateRange_, [Internet Sales])
RETURN
    _Result

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

AlexisOlson · ‎04-22-2024

There seems to be something pathological going on that's making the query planner do weird things with simpler measures. I'd really like to figure out what as I think that knowledge may be broadly applicable in measure optimization. It should not be doing this (some sort of cross-join?).

I got the above when using this code:

VAR _CurrYear = SELECTEDVALUE ( Dates[Year] )
VAR _CurrDate = SELECTEDVALUE ( Dates[Date] )
VAR _Result =
    CALCULATE (
        [Internet Sales],
        Dates[Year] = _CurrYear,
        Dates[Date] <= _CurrDate
    )
RETURN
    _Result

Using WINDOW is better but it still has steps with 1.4 million records.

CALCULATE (
    [Internet Sales],
    WINDOW (
        1, ABS,
        0, REL,
        ORDERBY ( Dates[Year], ASC, Dates[Date], ASC ),
        PARTITIONBY ( Dates[Year] )
    )
)

Can any DAX masters like @marcorusso or @jeffrey_wang shed some light?

CALCULATE Challenge - Round 1

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024