Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi all,
Sample pbix attached.
We have a very simplified one-table model (see Table1 below) and and two simple measures:
Total revenue = SUM(Table1[Revenue])PercentileM =
PERCENTILEX.INC(ALL(Table1[Client_code]), [Total revenue], .25)
We place Client_code and [PercentileM] in a table visual. Note the definition of Table1 in the file. The number of clients can be changed in the code, VAR numClients_
If the number of clients is 50K, the visual takes about 10 seconds to update. If it's 100K it goes up to almost 40 seconds. If it's 1M, it explodes, takes over an hour and finally crashes.
2 questions
1. Is there a way to optimize this, avoid it taking so long for large N? See Page 1 in the attached file
2. (See page 2 in the attached file) Since the value provided by [PercentileM] is actually the same for each row, we're being incredibly inefficient by having it run for each of the N rows in the visual. In fact, the result we're after is what we get if we place [PercentileM] in a card visual, which is very fast. Is there a way to get the result from the card visual and place it in each row of the table visual? So that [PercentileM] is executed only once? To clarify what we mean, we could create a one-row, one-column calculated table MeasureResult with that result:
MeasureResult = {[PercentileM]}
and then a measure that just reads that static result to be used in the table visual. See this on page 2 of the attached file
ReadFromTable = MAX(MeasureResult[Value])
This is however an inflexible solution, since the calculated table is static and the user couldn't apply filters for instance.
Many thanks
@MFelix @Zubair_Muhammad @mahoneypat @TomMartens @GilbertQ @camargos88 @OwenAuger
Table1
| Client_code | Revenue |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
| … | … |
| N | 1 |
| N | 2 |
| N | 3 |
|
N |
4 |
Solved! Go to Solution.
@AlB Here's an approach that gets you close, and might be further refined to get it just right. I added a random number to your original table to test it out (so not the totals were the same for all clients) like this
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Thanks. I look forward to your response
That's great! Thanks very much. The improvement is massive. I thought of something like that initially but assumed (wrongly, clearly) that the built-in implementation of PERCENTILEX would already be (quasi)optimized. I am really, really surprised that its performance is so poor compared to your solution 🤔
I made a couple of changes to account for the interpolation part. Not difficult but it took me a while to find online a clear definition of the exact algorithm EXCEL/DAX use. This is a first version:
PercentileINC M3 (PatMahoney's) V3 = //Added interpolation
VAR wantedPerc_ = [Wanted_perc]
VAR baseT_ = ALL(Table1[Client_code])
VAR vSummary = ADDCOLUMNS ( baseT_, "cSum", CALCULATE ( SUM ( Table1[number] ) ) )
VAR vAddRank = ADDCOLUMNS ( vSummary, "cRank", RANKX (vSummary, [cSum],,ASC,Skip))
VAR percentileRank_ = (wantedPerc_ * (COUNTROWS(baseT_) - 1)) + 1 //Difference with .EXC
VAR percRankFloor_ = FLOOR(percentileRank_,1)
VAR percRankCeiling_ = CEILING(percentileRank_,1)
VAR ranksT_ = TOPN(percRankCeiling_-percRankFloor_+1, FILTER(vAddRank, [cRank]<=percRankCeiling_), [cRank], DESC)
VAR val1_ = MINX(ranksT_,[cSum])
VAR val2_ = MAXX(ranksT_,[cSum])
VAR interpolated_ = val1_ + ((val2_ - val1_) * (percentileRank_ - percRankFloor_))
RETURN
interpolated_
Runs approx. in the same time as your version; so adding the interpolation has hardly any impact
And another version, some 20% faster than the previous one:
PercentileINC M3 (PatMahoney's) V4 = //Can it run faster than V3??
VAR wantedPerc_ = [Wanted_perc]
VAR baseT_ = ALL(Table1[Client_code])
VAR vAddRank = ADDCOLUMNS ( baseT_, "cRank", RANKX (baseT_, CALCULATE (SUM ( Table1[number] )),,ASC,Skip))
VAR percentileRank_ = (wantedPerc_ * (COUNTROWS(baseT_) - 1)) + 1 //Difference with .EXC
VAR percRankFloor_ = FLOOR(percentileRank_,1)
VAR percRankCeiling_ = CEILING(percentileRank_,1)
VAR valsT_ = TOPN(percRankCeiling_-percRankFloor_+1, FILTER(vAddRank, [cRank]<=percRankCeiling_), [cRank], DESC)
VAR auxT_ = ADDCOLUMNS(valsT_, "@Value", CALCULATE(SUM(Table1[number])))
VAR val1_ = MINX(auxT_,[@Value])
VAR val2_ = MAXX(auxT_,[@Value])
VAR interpolated_ = val1_ + ((val2_ - val1_) * (percentileRank_ - percRankFloor_))
RETURN
interpolated_
I will definitely mark your response as solution but want to leave this still open for discussion on question 2.
By the way, do you know if it's possible in DAX Studio to execute, for instance, a measure several times and get the average of the execution time? I.e., instead of executing the query manually 10 times, check the timings and extract the average manually, have DAX Studio do it automatically. I believe I'd read somewhere it was possible but cannot find where. Thanks
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Hey @AlB , Hey @mahoneypat ,
we are facing two problems
To overcome the first issue I started using a PATH object. My measure contains this line:
var p = CONCATENATEX( ALLSELECTED( 'Table1'[Client_code] ) , [Total revenue] , "|" , [Total revenue] , ASC )
p contains an ordered structure that can be used with all the PATH functions, like PATHLENGTH or PATHITEM.
The complete measure looks like this:
percentile with path and Client_code =
var __Percentile = 0.25
var p = CONCATENATEX( ALLSELECTED( 'Client_code'[Client_code] ) , [Total revenue] , "|" , [Total revenue] , ASC )
var position = PATHLENGTH( p ) * __Percentile
var valueAtPosition = VALUE( PATHITEM( p , position , TEXT ) )
return
valueAtPosition
As you can see I'm also using a dedicated Client_code table. I recommend using a dedicated table. There is more than one row per client, for this reason the measure will benefit from filter propagation instead of permanently scanning the table.
The above measure excutes in ~3.2s on my machine (Intel i7-9750H). I consider this not a substantial gain over the solution @mahoneypat provided meaning, maybe we have to face the fact that we already reached the optimum 🙂
In regards to your 2nd question, maybe there is the possibility to create a table that just contains a single row using GROUPBY or SUMMARIZE. Then this table will calculated only once during data refresh. You can use a more simple measure to blend the PERCENTILE value into your fact table.
Stay safe, stay healthy, have data (and of course A Merry Christmas)
Tom
Great. Thanks a lot. I will look at it over the following days and adapt it to include interpolation like PERCENTILEX.INC. Might come back with some additional questions
Thanks
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Thanks a lot for your reply. This looks GREAT, very interesting. I just saw that the results your measure produces are not exactly the same as what PERCENTILEX.INC yields. I believe however that is simply the interpolation and probably will not take much to fix. A few questions. I know you are on holidays. No rush 😊
1. I haven't worked with PATH objects. I will do my own research but would you care to explain, high level, the logic behind the approach?
2. How do you build the dedicated Client_code table? Is it just an aggregation, with one row per client and their total revenue?
3. "the measure will benefit from filterpropagation instead of permanently scanning the table" What do you exactly mean by this?
4. "In regards to your 2nd question, maybe there is the possibility to create a table that just contains a single row using GROUPBY or SUMMARIZE. Then this table will calculated only once during data refresh" Is this different from what I proposed initially in question 2, with the MeasureResult table and [ReadFromTable]? If not, we would still have the problem of it being a static solution, not responding to filters
5. "You can use a more simple measure to blend the PERCENTILE value into your fact table" Can you elaborate?
Many thanks and a happy Christmas to you too
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Hey @AlB ,
your questions ...
1 - the path object
Basically it's a string (don't be afraid, this string can become really long), what makes this string special, is the |-sign that separates the items. There are some functions that are very efficient like PATHITEM (returning the item from a path at a given position. The value at the 25th position can be determined like so
PATHITEM( path, 25)
From my experience, navigating a string that contains the |-sign as special character (this character is used b all the PATH... functions) is very fast.
Maybe you will find this this blog interesting: The previous value - Mincing Data - Gain Insight from Data (minceddata.info)
I create a PATH-object by using the table iterator funtion CONCATENATEX(...) In our example I'm using the clients as table and the SalesValue as expression, I also use the order by and order direction parameters of the function. When I'm done p contains 1M ordered sales values.
2 - the client table
I use this DAX code to create a client_code table from Table1:
Client_code = DISTINCT( ALLNOBLANKROW( 'Table1'[Client_code] ) )
Using a dedicated table allows me to create a dimension table (on the one-side) and a fact table (on the many side) provides the usage of the filter propagation.
3 - filter propagation
Filter propagation is one of the ruling principles of the Power BI dataset (basically this concept rules everything tabular). Sometimes filter propagation is by ways faster then scanning a single table. The scanning becomes necessary as there are multiple values per client_code.
4 - SUMMARIZE or GROUPBY
Basically it's not that different from a measure result table, but I did not understand how you wanted to create the measure result table, I'm sorry for the confusion. Yes, this solution will be static.
5 - using a simplified measure
As soon as there is a result table with a single row and a single column, the cell, can be blended into the fact table Table1 by a simple measure like MAX(...) instead of evaluating complex measure 1M-times.
Hopefully, this answers some of your questions. By the way, vacation is ideal point in time for more lengthy posts.
Stay safe, stay healthy, have data, and a Merry Christmas
Tom
Hey @AlB , Hey @mahoneypat ,
we are facing two problems
To overcome the first issue I started using a PATH object. My measure contains this line:
var p = CONCATENATEX( ALLSELECTED( 'Table1'[Client_code] ) , [Total revenue] , "|" , [Total revenue] , ASC )
p contains an ordered structure that can be used with all the PATH functions, like PATHLENGTH or PATHITEM.
The complete measure looks like this:
percentile with path and Client_code =
var __Percentile = 0.25
var p = CONCATENATEX( ALLSELECTED( 'Client_code'[Client_code] ) , [Total revenue] , "|" , [Total revenue] , ASC )
var position = PATHLENGTH( p ) * __Percentile
var valueAtPosition = VALUE( PATHITEM( p , position , TEXT ) )
return
valueAtPosition
As you can see I'm also using a dedicated Client_code table. I recommend using a dedicated table. There is more than one row per client, for this reason the measure will benefit from filter propagation instead of permanently scanning the table.
The above measure excutes in ~3.2s on my machine (Intel i7-9750H). I consider this not a substantial gain over the solution @mahoneypat provided meaning, maybe we have to face the fact that we already reached the optimum 🙂
In regards to your 2nd question, maybe there is the possibility to create a table that just contains a single row using GROUPBY or SUMMARIZE. Then this table will calculated only once during data refresh. You can use a more simple measure to blend the PERCENTILE value into your fact table.
Stay safe, stay healthy, have data (and of course A Merry Christmas)
Tom
I'm sure you could do the same in R or Python and it would also be fast. I don't think Charticulator would apply here, but it is an interesting idea to be able to edit the MDX behind a table/matrix visual directly. You should propose it.
Regards,
Pat
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
@mahoneypat @TomMartens (and everyone else)
Pat, thanks for creating the idea for the visual-level variables.
I was having a look further on question 2. Imagine we have a simple table visual with Table1[Client_code] and this measure that uses the built-in PERCENTILEX.INC:
PercentileM_INC with NumberCol =
VAR wantedPerc_ = [Wanted_perc]
RETURN
PERCENTILEX.INC(ALL(Table1[Client_code]), CALCULATE(SUM(Table1[number])), wantedPerc_)
where [Wanted_perc] is simply a constant. We can then get the query for this visual from the Performance Analyzer:
// DAX Query
DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Table1'[Client_code], "IsGrandTotalRowTotal"),
"PercentileM_INC_with_NumberCol", 'Table1'[PercentileM_INC with NumberCol]
)
VAR __DS0PrimaryWindowed =
TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, 'Table1'[Client_code], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC, 'Table1'[Client_code]
If we run this in DAX Studio and N (number of clients) is 1M, it takes over an hour and crashes, as stated earlier. We can however modify the query to apply that visual-level variable:
// DAX Query
DEFINE
VAR valForAllRows_ = 'Table1'[PercentileM_INC with NumberCol] //Executed only ONCE for the visual
VAR __DS0Core_V2 = //This is the more efficient variation
SUMMARIZECOLUMNS (
ROLLUPADDISSUBTOTAL ( 'Table1'[Client_code], "IsGrandTotalRowTotal" ),
"PercentileM_INC_with_NumberCol", valForAllRows_ //Using the variable here
)
VAR __DS0PrimaryWindowed =
TOPN (
502,
__DS0Core_V2, //CHANGE here to swap between versions
[IsGrandTotalRowTotal], 0,
'Table1'[Client_code], 1
)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[IsGrandTotalRowTotal] DESC,
'Table1'[Client_code]
the above executes in 2 secs
I was wondering whether something like this can be implemented in a custom visual, where you can determine how the visual is run internally?? Perhaps with Charticulator or with a Python visual?
Thoughts?
Thanks
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
@AlB Glad it helped. I love a good DAX or M challenge, and this was a fun one. Most times we don't need to use the value term of RANKX, but it comes in handy. Thank you for sharing your expression to see the improvements you've made. If we could make custom functions like in M, this would be a good one to keep handy.
It does also highlight the potential of having visual level measures that could be referenced on each row, which would make a much bigger improvement still. I added that idea here.
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Thanks. I look forward to your response
That's great! Thanks very much. The improvement is massive. I thought of something like that initially but assumed (wrongly, clearly) that the built-in implementation of PERCENTILEX would already be (quasi)optimized. I am really, really surprised that its performance is so poor compared to your solution 🤔
I made a couple of changes to account for the interpolation part. Not difficult but it took me a while to find online a clear definition of the exact algorithm EXCEL/DAX use. This is a first version:
PercentileINC M3 (PatMahoney's) V3 = //Added interpolation
VAR wantedPerc_ = [Wanted_perc]
VAR baseT_ = ALL(Table1[Client_code])
VAR vSummary = ADDCOLUMNS ( baseT_, "cSum", CALCULATE ( SUM ( Table1[number] ) ) )
VAR vAddRank = ADDCOLUMNS ( vSummary, "cRank", RANKX (vSummary, [cSum],,ASC,Skip))
VAR percentileRank_ = (wantedPerc_ * (COUNTROWS(baseT_) - 1)) + 1 //Difference with .EXC
VAR percRankFloor_ = FLOOR(percentileRank_,1)
VAR percRankCeiling_ = CEILING(percentileRank_,1)
VAR ranksT_ = TOPN(percRankCeiling_-percRankFloor_+1, FILTER(vAddRank, [cRank]<=percRankCeiling_), [cRank], DESC)
VAR val1_ = MINX(ranksT_,[cSum])
VAR val2_ = MAXX(ranksT_,[cSum])
VAR interpolated_ = val1_ + ((val2_ - val1_) * (percentileRank_ - percRankFloor_))
RETURN
interpolated_
Runs approx. in the same time as your version; so adding the interpolation has hardly any impact
And another version, some 20% faster than the previous one:
PercentileINC M3 (PatMahoney's) V4 = //Can it run faster than V3??
VAR wantedPerc_ = [Wanted_perc]
VAR baseT_ = ALL(Table1[Client_code])
VAR vAddRank = ADDCOLUMNS ( baseT_, "cRank", RANKX (baseT_, CALCULATE (SUM ( Table1[number] )),,ASC,Skip))
VAR percentileRank_ = (wantedPerc_ * (COUNTROWS(baseT_) - 1)) + 1 //Difference with .EXC
VAR percRankFloor_ = FLOOR(percentileRank_,1)
VAR percRankCeiling_ = CEILING(percentileRank_,1)
VAR valsT_ = TOPN(percRankCeiling_-percRankFloor_+1, FILTER(vAddRank, [cRank]<=percRankCeiling_), [cRank], DESC)
VAR auxT_ = ADDCOLUMNS(valsT_, "@Value", CALCULATE(SUM(Table1[number])))
VAR val1_ = MINX(auxT_,[@Value])
VAR val2_ = MAXX(auxT_,[@Value])
VAR interpolated_ = val1_ + ((val2_ - val1_) * (percentileRank_ - percRankFloor_))
RETURN
interpolated_
I will definitely mark your response as solution but want to leave this still open for discussion on question 2.
By the way, do you know if it's possible in DAX Studio to execute, for instance, a measure several times and get the average of the execution time? I.e., instead of executing the query manually 10 times, check the timings and extract the average manually, have DAX Studio do it automatically. I believe I'd read somewhere it was possible but cannot find where. Thanks
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Hey AIB,
I will look into your question in the upcoming days off, so it can take a while 🙂
Regards,
Tom
@AlB Here's an approach that gets you close, and might be further refined to get it just right. I added a random number to your original table to test it out (so not the totals were the same for all clients) like this
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Will think on it, but I doubt it. Nice job on sharing it in a clever way that is easily scaleable to test out potential solutions. This really highlights the need for visual-level and/or page-level variables. Not sure if that idea is already suggested or not, but you should if not.
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Interesting thoughts. You created a [Value] column based on the PercentileM measure, but I guess you just cannot avoid the measure calculation once you add the client code in the visual.
Regards
Paul
Check out the November 2025 Power BI update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
| User | Count |
|---|---|
| 92 | |
| 69 | |
| 50 | |
| 40 | |
| 38 |