The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Introduction
OK, let's get a few things out of the way right off the bat.
First and foremost, I do not consider myself a DAX performance optimization expert nor do I play one on TV. Truth be told, the vast majority of solutions that I post to the forums are with utmost certainty not optimized DAX code. Believe it or not, that doesn't bother me...at all. The reality is that DAX optimization is difficult and time consuming and, quite frankly, the vast majority of the time the optimization does not matter and is simply a waste of time. Or, the optimization makes the code less readable. We all remember the bad old days of Perl code. For the vast majority of use cases, if the DAX runs in 600 milliseconds or 300 milliseconds, the average user is not going to be all that upset. It's still sub-second response times. What matters most is a solution. Once you have a solution it can always be optimized if necessary.
So, with that out of the way, on to the next useless pre-emptive clarification. This post is about DAX optimization. Not data model optimization. Not Power Query optimization. Not, pick a topic, optimization. DAX optimization. If you want to leave comments about other optimizations that are not DAX, write your own blog article on them. That's not the purpose of this post.
Third clarification, I make no claims that this is the "most" optimized DAX code. In fact, I would LOVE to see feedback/comments on even more efficient DAX code. I have attached the PBIX file I used to this post.
So, what is this post about? Simple. A community member presented a problem where Power BI was taking 30 minutes to render a visualization. Let that sink in, 30 minutes. That's a lot of minutes folks. This post simply documents my efforts to optimize the DAX as much as possible. Now, that 30 minutes was more like 10 minutes on my machine. And I have a rickety old Surface Pro 4 with a mere 16 GB of memory so I feel bad for that guy... Anyway, I was able to get this calculation down to about 20 seconds, which is a reasonable amount of improvement. This post is about presenting to the community some lessons learned in this journey. Your mileage may vary because DAX optimization is a complex, tricky and relatively unexplored area. And, better yet, it is going to vary by circumstance. But, I felt that there were enough lessons learned here to open the discussion up to a wider audience. Maybe, maybe not.
The Original Problem
So the original problem involved three tables within a larger data model that looked like the following:
The fact table involved, Tracking History, only had about 60,000 rows in it, which made this problem even more intriguing. DAX performance issues are relatively rare and to have one occur on such a small dataset is somewhat surprising.
There were two measures involved:
No. of Orders =
VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
IF (
AND ( StartDate > MinDateInContext, EndDate < MaxDateInContext ),
1,
IF (
and(AND (StartDate > MinDateInContext, EndDate>MaxDateInContext),MaxDateInContext>StartDate),
1,
IF (
and( AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),EndDate>MinDateInContext),
1,
IF (
AND ( StartDate < MinDateInContext, EndDate > MaxDateInContext ),
1,
BLANK ()
)
)
)
)
Total Orders = SUMX('Tracking History',[No. of Orders])
This creates the following visualization:
In Performance Analyzer, this visual took 595,908 milliseconds to render, or about 9 minutes, 56 seconds.
Step 1
OK, step one, let's clean up the code. Nested IF's. Ack! Let's change those to a SWITCH statement.
No. of Orders 2a =
VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
SWITCH(TRUE(),
AND ( StartDate > MinDateInContext, EndDate < MaxDateInContext ),1,
AND( AND (StartDate > MinDateInContext, EndDate > MaxDateInContext),MaxDateInContext > StartDate),1,
AND( AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),EndDate > MinDateInContext),1,
AND ( StartDate < MinDateInContext, EndDate > MaxDateInContext ),1,
BLANK()
)
I also tried this with using && instead of AND like this:
No. of Orders 2 =
VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
SWITCH(TRUE(),
StartDate > MinDateInContext && EndDate < MaxDateInContext,1,
(MaxDateInContext > StartDate && StartDate > MinDateInContext) && EndDate > MaxDateInContext,1,
(StartDate < MinDateInContext && EndDate < MaxDateInContext) && EndDate > MinDateInContext,1,
StartDate < MinDateInContext && EndDate > MaxDateInContext,1,
BLANK()
)
Performance analyzer results were as follows:
So, first couple of lessons, no real impact to performance using SWITCH instead of nested IF statements. Also, no impact in using AND functions versus inline &&. While these times are slightly longer, they are not statistically significant as the same query will run slightly longer or shorter depending on a range of factors. But, the code is definitely more readable!
Step 2
Now that we can better see the logic involved, it is evident that the logic is filtering out certain rows and assigning a value of 1 while the remainder that do not fit the criteria are given a value of BLANK. However, in the visual, no more than 2,000 is displayed as a sum of all of the 1's and there are 60,000 rows in the fact table. This means that the vast majority of our rows have to be tested for a bunch of conditions before they "fall through" to be assigned BLANK. Thus, it follows that if we reverse the logic we will eliminate rows faster and thus there will be less testing involved and hence less processing and calculation.
No. of Orders 3 =
VAR StartDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[Start Date]) )
VAR EndDate = VALUE ( SELECTEDVALUE ( 'Tracking History'[End Date] ) )
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
RETURN
SWITCH(TRUE(),
AND ( StartDate > MinDateInContext, EndDate > MaxDateInContext ),BLANK(),
AND( AND (StartDate > MinDateInContext, EndDate > MaxDateInContext),MaxDateInContext < StartDate),BLANK(),
AND( AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),EndDate < MinDateInContext),BLANK(),
AND ( StartDate < MinDateInContext, EndDate < MaxDateInContext ),BLANK(),
1
)
Performance analyzer results were as follows:
Aha! We've knocked the calculation time down by half! So, lesson learned, when performing conditional logic tests, structure your code such that you eliminate the most amount of rows as early as possible.
Step 3
Thus far I have been creating two measures for each step, creating a new Total Orders measure along with my new No. of Orders measure. This is a pain and I'm lazy so let's combine this into a single measure:
Total Orders 4 =
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
VAR __Table =
ADDCOLUMNS(
'Tracking History',
"__No of Orders",
SWITCH(TRUE(),
AND (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
),BLANK(),
AND(
AND (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
),
MaxDateInContext < 'Tracking History'[Start Date]
),BLANK(),
AND(
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),
'Tracking History'[End Date] < MinDateInContext
),BLANK(),
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),BLANK(),
1
)
)
RETURN
SUMX(__Table,[__No of Orders])
Performance analyzer results were as follows:
Wow! Perhaps unexpectedly this really improved performance!! The lesson learned here? Being lazy is a good thing? Difficult to say but likely has something to do with internal DAX optimization. Having all of the code in a single measure helps DAX optimize the query, for example.
Step 4
Now that we have reversed the logic and have this all in one measure, we can clearly see that we could move the first logic test into a FILTER of the table. So pre-filter the table so we do not have to do the first logic test. Let's see what that does:
Total Orders 5 =
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
VAR __Table =
ADDCOLUMNS(
FILTER(
'Tracking History',
OR (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
)
),
"__No of Orders",
SWITCH(TRUE(),
AND(
AND (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
),
MaxDateInContext < 'Tracking History'[Start Date]
),BLANK(),
AND(
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),
'Tracking History'[End Date] < MinDateInContext
),BLANK(),
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),BLANK(),
1
)
)
RETURN
SUMX(__Table,[__No of Orders])
Performance analyzer results were as follows:
OK, not as dramatic of an improvement, but still knocked about 25% off of the calculation time. Lesson learned, filter early!
Step 5
Looking closely at our logic, the last two tests are redundant. We can get rid of the redundant test.
Total Orders 6 =
VAR MinDateInContext = VALUE ( MIN ( 'DateTimeTable'[Date] ) )
VAR MaxDateInContext = VALUE ( MAX ( DateTimeTable[Date] ) )
VAR __Table =
ADDCOLUMNS(
FILTER(
'Tracking History',
OR (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
)
),
"__No of Orders",
SWITCH(TRUE(),
AND(
AND (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
),
MaxDateInContext < 'Tracking History'[Start Date]
),BLANK(),
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),BLANK(),
1
)
)
RETURN
SUMX(__Table,[__No of Orders])
Performance analyzer results were as follows:
Nope, no real improvement (likely because DAX already optimized out this redundancy). But, the code is shorter and cleaner so that's a win!
Step 6
Those VALUE statements seem unnecessary, let's get rid of those.
Total Orders 7 =
VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
VAR __Table =
ADDCOLUMNS(
FILTER(
'Tracking History',
OR (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
)
),
"__No of Orders",
SWITCH(TRUE(),
AND(
AND (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
),
MaxDateInContext < 'Tracking History'[Start Date]
),BLANK(),
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),BLANK(),
1
)
)
RETURN
SUMX(__Table,[__No of Orders])
Performance analyzer results were as follows:
Wow, another 25% reduction. Lesson learned, stop using VALUE and VALUES unnecessarily! I see a lot of it in the forums. It is costing you performance! Only use those functions if you really need to.
Step 7
In looking at the logic once again and considering our pre-filtering, that first logic test is overly complicated. Let's simplify it.
Total Orders 8 =
VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
VAR __Table =
ADDCOLUMNS(
FILTER(
'Tracking History',
OR (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
)
),
"__No of Orders",
SWITCH(TRUE(),
MaxDateInContext < 'Tracking History'[Start Date],
BLANK(),
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),BLANK(),
1
)
)
RETURN
SUMX(__Table,[__No of Orders])
Performance analyzer results were as follows:
Hey! Another 33% improvement. Lesson learned, simpler logic is better!
Step 8
It worked before, what if we move the logic test to the filter clause?
Total Orders 9 =
VAR MinDateInContext = MIN ( 'DateTimeTable'[Date] )
VAR MaxDateInContext = MAX ( DateTimeTable[Date] )
VAR __Table =
ADDCOLUMNS(
FILTER(
'Tracking History',
AND (
OR (
'Tracking History'[Start Date] > MinDateInContext,
'Tracking History'[End Date] > MaxDateInContext
),
MaxDateInContext > 'Tracking History'[Start Date]
)
),
"__No of Orders",
SWITCH(TRUE(),
AND (
'Tracking History'[Start Date] < MinDateInContext,
'Tracking History'[End Date] < MaxDateInContext
),BLANK(),
1
)
)
RETURN
SUMX(__Table,[__No of Orders])
Performance analyzer results were as follows:
OK, maybe a minor peformance improvement, didn't seem to hurt anything and our fastest time yet!
Conclusion
Performance tuning DAX can have dramatic results. In this case, code that runs 30x faster than the original. To achieve these kinds of improvements, pay attention to the following:
Stay tuned for Part 2 where I will cover more optimizations and discuss the dangers of over-optimization!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.