Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
For those haters of No CALCULATE like @AlexisOlson, 😅, I challenge you to create measures with an explicit CALCULATE that for the prescribed scenarios perform anywhere close to the same speed as the very simple No CALCULATE measures that perform date intelligence calculations. Here are the rules:
A sample No CALCULATE measure is:
NC Intenet Sales (YTD) =
VAR __Date = MAX('Dates'[Date])
VAR __Year = YEAR(__Date)
VAR __Table =
SUMMARIZE(
ALL('FactInternetSales'),
'FactInternetSales'[OrderDate],
"__Year", YEAR([OrderDate]),
"__Sales", SUM('FactInternetSales'[SalesAmount]))
VAR __Result = SUMX(FILTER(__Table, [__Year] = __Year && [OrderDate] <= __Date),[__Sales])
RETURN
__Result
The PBIX file is attached below. There are 16 measures for calculating such things as previous year to date, previous month to date, previous week to date, etc. Here is a reference video that also describes the scenario and original performance results:
Solved! Go to Solution.
I think this should qualify, though it isn't super clean.
VAR _AllDates_ =
SUMMARIZE (
ALL ( FactInternetSales ),
Dates[Year],
Dates[Date]
)
VAR _DateRange_ =
WINDOW (
1, ABS,
0, REL,
_AllDates_,
ORDERBY ( Dates[Date] ),
PARTITIONBY ( Dates[Year] )
)
VAR _Result = CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
_Result
thank God it's such an obvious and easy solution that even a DAX novice could understand and doesn't rely on a function introduced within that last year which pretty much means the solution was unsolvable for 7 years or so...
WINDOW isn't required. A plain filter works fine too and has been around since the beginning.
VAR _AllDates_ = SUMMARIZE ( ALL ( FactInternetSales ), Dates[Date] )
VAR _CurrDate = MAX ( Dates[Date] )
VAR _CurrYear = YEAR ( _CurrDate )
VAR _DateRange_ =
FILTER (
_ALLDates_,
YEAR ( Dates[Date] ) = _CurrYear &&
Dates[Date] <= _CurrDate
)
VAR _Result = CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
_Result
However, there are still 15 other date intelligence measures in the file that are part of the challenge.
I'm not really interested in doing all 15 other ones as I don't think they'll reveal much that this YTD example doesn't already. If there is one that you think is meaningfully different, let me know.
@Greg_Deckler, PW is an easy case where the TI solution or something like the following is just as fast.
CALCULATE (
[Internet Sales],
TREATAS ( VALUES ( Dates[Prior Week Date] ), Dates[Date] )
)
@AlexisOlson
Exactly.
What the formula engine is trying to do is to create the following table:
Date | Date YTD | Sum of SalesAmount |
Then it does the aggregation over Date.
With help of some mathmatics, given that the Date table is 7,670 rows the formula engine would produce a crossjoin table of around (1/2 )* (7,670) * (7,670) = 29,414,450 rows. This is simply the area of a triangle.
In opttimized versions, the corossjoin is happening to only the dates that do exist in the fact tabe (1,124) rows. That would result in a table of around (1/2 )* (1,124 * (1,124) = 6,316,680 rows.
This is why the cardinality of the Dates[Date] and FactInternetSales[Order Date] do matter alot.
I think this should qualify, though it isn't super clean.
VAR _AllDates_ =
SUMMARIZE (
ALL ( FactInternetSales ),
Dates[Year],
Dates[Date]
)
VAR _DateRange_ =
WINDOW (
1, ABS,
0, REL,
_AllDates_,
ORDERBY ( Dates[Date] ),
PARTITIONBY ( Dates[Year] )
)
VAR _Result = CALCULATE ( [Internet Sales], _DateRange_ )
RETURN
_Result
@AlexisOlson In my testing looking at the DAX query in Performance Analyzer this seems consistently about 40%-50% slower than the fastes No CALCULATE approach. But, we're quibbling over 10's of ms here so I feel like this passes. However, there are still 15 other date intelligence measures in the file that are part of the challenge.
What the formula engine is trying to do is to create the following table:
Date | Date YTD | Sum of SalesAmount |
Then it does the aggregation over Date.
With help of some mathmatics, given that the Date table is 7,670 rows the formula engine would produce a crossjoin table of around (1/2 )* (7,670) * (7,670) = 29,414,450 rows. This is simply the area of a triangle.
In opttimized versions, the corossjoin is happening to only the dates that do exist in the fact tabe (1,124) rows. That would result in a table of around (1/2 )* (1,124 * (1,124) = 6,316,680 rows.
This is why the cardinality of the Dates[Date] and FactInternetSales[Order Date] do matter alot.
Usually I use the Window function following a NoCALCULATE approach by pre-calculating values then pushing all other calculation to the formula engine. For example the following produces the simplist and fastest query plan along with the minimum number of storage engine queries among all other solutions.
Internet Sales (YTD) Window =
VAR AllDateSales =
SUMMARIZE(
ALL ( FactInternetSales ),
Dates[Year],
Dates[Date],
"@Amount", SUM ( FactInternetSales[SalesAmount] )
)
VAR Result =
SUMX (
WINDOW (
0, ABS,
0, REL,
AllDateSales,
ORDERBY ( Dates[Date] ),
PARTITIONBY ( Dates[Year] )
),
[@Amount]
)
RETURN
Result
I would however say that your WINDOW-CALCULATE solution matches the performance of other NoCALCULATE solutions so I vote for it as an acceptable solution.
@AlexisOlson I'll have a look in the morning. But if it does work, I mean, thank God it's such an obvious and easy solution that even a DAX novice could understand and doesn't rely on a function introduced within that last year which pretty much means the solution was unsolvable for 7 years or so...
@AlexisOlson Big difference I see is that the visual doesn't go to the Date granularity, which means it isn't the same scenario.
Good point on the granularity. Even matching granularity though, there are still different behaviors between files that I don't understand (in particular, TOTALYTD performs similarly to your measure in Contoso).
@AlexisOlson I'll have to look closer but both have a data table marked as a date table. The Dates table in Contoso though has significantly less rows in it. Triple in fact.
@Greg_Deckler while it is quite clear that no calculate is often superior in terms of performance. I tend to agree with @lbendlin that in most cases basic functions are the better choice strictly because your co-developers might not understand why you are using this kind of fancy dax instead of basic structures. I would clasisfy this kind of dax as legacy code since it is unreasonably hard to understand when compared to what it achieves.
Proud to be a Super User!
@ValtteriN I think you need to review the CALCULATE partial-solution to this problem and then tell me which one is more intuitive for someone to understand. Hint, they are literally the same solution except that one uses SUMX at the end and the other CALCULATE. The SUMX directly sums the value in the virtual table created. The CALCULATE applies the virtual table created as a filter of the base table over which a measure is applied.
Now, on the opinion of which of those is more intuitive to understand and debug, well, I have my opinion.
@Greg_Deckler I agree that the solution without CALCULATE is elegant, but maybe the general paradigm that people have learned to write with CALCULATE as the most basic function of DAX causes the solution to feel less intuitive. Your macro sentece is spot on "DAX is easy, CALCULATE makes DAX hard..." As things stand, if I ask colleagues at the office is the no calculate solution intuitive the answer is no. Because of this it makes more sense to me to write DAX that uses CALCULATE as the basis so that there is no unnecessary confusion and if there is a need to optimize the performace based on end-user feedback then I will change my DAX to be more performance optimized.
Proud to be a Super User!
@ValtteriN I think that people that have learned to use CALCULATE find the No CALCULATE approach takes some getting used to. However, that was never the intended audience for No CALCULATE. I think that for those new to DAX, learning the No CALCULATE is far more intuitive and easy to do versus learning the CALCULATE method. That's always been the intended audience. That said, I know of more than a few people that have switched from CALCULATE to No CALCULATE and never looked back.
That would be a nice psychological experiment. Have a group of Power BI Newbies, split them in two parts, teach one part to use CALCULATE and the other part to use the aggregation functions. Then after a month show them the "other side" and let them decide which technique to use.
This is very similar to the On-Object Interaction discussion. People who are new to Power BI have no issues using that. Those of us who have grown up with the legacy options dialogs have a much harder time with adopting the on-object interaction.
In general you don't want to include the syntax sugar in your tests. Keep to the basic functions.
HI @Greg_Deckler,
Thanks for your sharling, I think these technical research will help other user to understanding how dax calculate.
Regards,
Xiaoxin Sheng
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
31 | |
16 | |
14 | |
14 | |
9 |