Reply
HassanAshas
Helper V
Helper V

Optimizing DAX that requires calculating 80th and 20th Percentile Value

Hi, I have a dataset of property sales. POWER BI File download link is available at the last. 

 

I need to calculate the average property price and then calculate the YoY % change compared to last year. This needs to be shown in a line chart with years on the X-axis. We also need to create one line for each property type (another attribute in our dataset).

The problem is, there are a lot of outliers in the dataset. Therefore, it is required to exclude the top 20% and bottom 20% of prices from the dataset and only calculate the average property price for the middle 60% of values. This also needs to be dynamic and must be calculated dynamically for all the filters that we have in the report. For example, if a filter on region is selected, then the top 20% and bottom 20% prices of the filtered dataset will be excluded from the results.

 

I have created the following three DAX formulas to do this, but they are performing very slowly on my dataset. There are around 4+ million records in the dataset, which is probably causing the issue, but I am wondering if there's a better way to resolve this problem?

 

 

Average Property Price DAX 

 

Average Property Price = 

VAR _Bottom20 = PERCENTILEX.INC(properties_data, properties_data[price_paid], 0.2)  -- 20th Percentile
VAR _Top20 = PERCENTILEX.INC(properties_data, properties_data[price_paid], 0.8)     -- 80th Percentile
var _Result = 
AVERAGEX(
        FILTER(
            properties_data, 
            properties_data[price_paid] > _Bottom20 && 
            properties_data[price_paid] < _Top20
        ),
        properties_data[price_paid]
    )

return _Result

 

Average Property Price (Last Year) 

 

Average Property Price (LY) = 

var _LastYear = SELECTEDVALUE('Calendar'[Year], MAX('Calendar'[Year])) - 1 
var _Month = SELECTEDVALUE('Calendar'[Month]) 

var _Result =
 IF(
    ISBLANK(_Month), 
            CALCULATE(
            [Average Property Price], 
            'Calendar'[Year] = _LastYear, 
            REMOVEFILTERS('Calendar')
        ), 
        CALCULATE(
            [Average Property Price], 
            'Calendar'[Year] = _LastYear, 
            'Calendar'[Month] =  _Month, 
            REMOVEFILTERS('Calendar')
        )
 )

return _Result

 

YoY Change % in Average Property Price 

 

YoY % Change in Average Property Price = 

var _CurrentYear = [Average Property Price]
var _LastYear = [Average Property Price (LY)]

var _Result = 

DIVIDE(
    _CurrentYear - _LastYear, 
    _LastYear, 
    0
)
return _Result

 

 

I have also tried one more thing: I created a table containing pre-computed values for the 20th percentile and the 80th percentile for all possible combinations of my filters. However, I am still facing issues because it is possible that someone might select "multiple values" from one filter at the same time, which leads to problems with this approach.

 

You can see my PBI report in the Google Drive link below. In that report, there's a table named "lu_price_percentiles" containing pre-computed percentile values for each possible combination of my filters.

 

Can anyone provide guidance on how I can improve my DAX for this specific problem?

 

You can download Power BI Report from this link: https://drive.google.com/file/d/14Sr6IuNFX6sTrqLDlT2eCVBcokZWLYHe/view?usp=sharing

6 REPLIES 6
v-kaiyue-msft
Community Support
Community Support

Hi @HassanAshas ,

 

Have you already solved the problem? If so, can you share your solution here and mark the correct answer as standard to help other members find it faster? Thank you very much for your co-operation!

 

 

Best Regards,

Clara Gong

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.

lbendlin
Super User
Super User

Simplifying the LY formula

 

Average Property Price (LY) = 
CALCULATE([Average Property Price],SAMEPERIODLASTYEAR('Calendar'[Date]))

 

Checking the number of rows

lbendlin_0-1728608063353.png

Going through all rows of the fact table

lbendlin_1-1728608153176.png

Limiting to only the observed values

 

Average Property Price = 
VAR _Bottom20 = PERCENTILEX.INC(VALUES(properties_data[price_paid]), [price_paid], 0.2)  -- 20th Percentile
VAR _Top20 = PERCENTILEX.INC(VALUES(properties_data[price_paid]), [price_paid], 0.8)     -- 80th Percentile
var _Result = 
AVERAGEX(
        FILTER(
            properties_data, 
            [price_paid] > _Bottom20 && 
            [price_paid] < _Top20
        ),
        [price_paid]
    )
return _Result

 

Confirming better query plan

lbendlin_2-1728608918365.png

 

Still need to get rid of the CallBackDataID

lbendlin_3-1728608964595.png

Refactoring to use CALCULATE

Average Property Price = 
VAR _Bottom20 = PERCENTILEX.INC(VALUES(properties_data[price_paid]), [price_paid], 0.2)  -- 20th Percentile
VAR _Top20 = PERCENTILEX.INC(VALUES(properties_data[price_paid]), [price_paid], 0.8)     -- 80th Percentile
var _Result = 
CALCULATE(AVERAGE(properties_data[price_paid]),
    properties_data[price_paid] > _Bottom20, 
    properties_data[price_paid] < _Top20
    )
return _Result

Looks good.

lbendlin_4-1728609550432.png

 

@lbendlin  Oh, god. Thank you so much! 

 

I really need to spend more time understanding DAX and optimizations like this. 
There's just one problem, it seems to be giving incorrect result. Please see attached picture below, I added both measures (previous one and updated one) in different cards. 

HassanAshas_0-1728611676077.png

 

Not exactly sure what's the reason but I am trying to debug it. 

 

I noticed that too.  I don't know if the percentile calculation yields different results when using distinct sale prices.  You may need to adjust that part back to how it was before.

I didn't understand your LY formula either.

@lbendlin  Yes, that is correct.

 

I understand the issue now. VALUES function yield unique rows against the specified column. PERCENTILE function is now calculating the 20th and 80th percentile against those unique set of rows, instead of considering all rows (which it should). 

 

Due to this reason, we can't really use VALUES in this case. And hence, we are back to the square one. 

The other changes should still stick. Continue to evaluate the query plan and try other refactoring techniques.

avatar user

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)