Re: TRIMMEAN

Greg_Deckler · ‎05-05-2020

When you think you pick an easy one...

In my recent quest to create or catalog as many DAX equivalents for Excel functions, I figured this one would be a cinch. Well, not so much. Between poor documentation and vexing issues with DAX not having any kind of inherent sort for data, I very nearly pulled my hair out over this one and at times felt very much like the young woman in the photo. Well, anyway, a double, concurrent while loop and several burnt out, overloaded brain cells later, I was apparently able to solve a 2 1/2 year old request and I guess I was correct back then, it would require RANKX but that was only just the beginning! So @cspress , here is your TRIMMEAN. Apologies for the delay...

TRIMMEAN = 
    VAR __Table = 
        ADDCOLUMNS(
            'Table',
            "Rank",RANKX('Table',[Value])
        )
    VAR __Percent = .2
    VAR __Count = COUNTROWS(__Table)
    VAR __Trim = MROUND(__Count * __Percent,2) / 2
    VAR __MaxRank = MAXX(__Table,[Rank])
    VAR __MinRank = MINX(__Table,[Rank])
    VAR __RanksTable =
        ADDCOLUMNS(
            ADDCOLUMNS(
                GROUPBY(
                    __Table,
                    [Rank],
                    "Count",COUNTX(CURRENTGROUP(),[Value]),
                    "Value",MAXX(CURRENTGROUP(),[Value])
                ),
                "CumulativeBottomCount",COUNTROWS(FILTER(__Table,[Rank] >= EARLIER([Rank]))),
                "CumulativeTopCount",COUNTROWS(FILTER(__Table,[Rank] <= EARLIER([Rank])))
            ),
            "BottomWhile",__Trim - [CumulativeBottomCount],
            "TopWhile",__Trim - [CumulativeTopCount]
        )
    VAR __MinBottom = MAXX(FILTER(__RanksTable,[BottomWhile]<=0),[BottomWhile])
    VAR __MinTop = MAXX(FILTER(__RanksTable,[TopWhile]<=0),[TopWhile])
    VAR __FinalBottomRankTable = 
        ADDCOLUMNS(
            FILTER(__RanksTable,[BottomWhile]>=__MinBottom),
            "Product",IF([BottomWhile]>=0,[Count]*[Value],([Count] + [BottomWhile]) * [Value])
        )
    VAR __FinalTopRankTable =
        ADDCOLUMNS(
            FILTER(__RanksTable,[TopWhile]>=__MinTop),
            "Product",IF([TopWhile]>=0,[Count]*[Value],([Count] + [TopWhile]) * [Value])
        )
    VAR __Bottom = SUMX(__FinalBottomRankTable,[Product])
    VAR __Top = SUMX(__FinalTopRankTable,[Product])
RETURN
    DIVIDE(
        SUMX(__Table,[Value]) - __Bottom - __Top,
        __Count - 2 * __Trim
    )

To clarify what is going on here, TRIMMEAN in Excel essentially ranks your data and trims off a number of rows equal to the percentage specified. The documentation doesn't really tell you about the ranking part, but it does it, it is not just trimming off the ordered list of rows. So, per the documentation, TRIMMEAN rounds the number of excluded data points down to the nearest multiple of 2. If percent = 0.1, 10 percent of 30 data points equals 3 points. For symmetry, TRIMMEAN excludes a single value from the top and bottom of the data set. Great. Where you run into trouble is when you have ties at the top and bottom of your dataset. Excel's TRIMMEAN is smart enough to only trim off the correct number of rows. So if you are trimming 3 points off the top and the bottom and have 2 1's and 3 2's, Excel's TRIMMEAN will only trim off the 2 1's and a single 2. Emulating this in DAX is not straight-forward at all and requires a lot of table gymnastics, double concurrent while loops and so on, such as me lying down on my bed with my eyes shut trying to figure out how to solve this problem until I got enough of an idea to drag myself back to my computer and continuing working on it. If you are wondering, the idea was around taking the averages of the tops and bottoms and multiplying that by the number of items to trim off from each side. Didn't actually work at all, but it eventually led me to the above solution, which I *think* works for all cases.

eyJrIjoiMmUyZjEzNDgtMWNhNC00OGI0LWE2ZDktNjA2ZmY1ZGVkMDdiIiwidCI6IjRhMDQyNzQzLTM3M2EtNDNkMi04MjdiLTAwM2Y0YzdiYTFlNSIsImMiOjN9

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

hools · ‎09-24-2024

Thank you so much for this. It works! But of course, I tried it with about 150,000 rows of data, and, yep, it's quite slow! For small-ish datasets, it's OK, but even for medium-sized ones with an Intel Core I9 solid state drive Dell all-singing all-dancing desktop, it's too slow. So I resorted to manipulating my data with M, grouping the data and coming up with a calculated column (in M) from which I could exclude the tops and tails with a simple CALCULATE measure in DAX. But thank you, thank you for this DAX measure ... lots of interesting stuff in there for my file of "useful hacks".

CL2316 · ‎11-21-2022

Hi there, thanks so much for posting this!

Would you be able to add some commentary on what each variable is doing/why and how? I'm a PowerBI newbie and would really like to understand each part of that measure and how it calculates the trimmed mean.

Thanks so much

Greg_Deckler · ‎11-21-2022

@CL2316 Sure, see below. It may help to use the PBIX and to return each table variable as a table to see what is in the table at each stage of the calculation.

TRIMMEAN = 
    // First, take the table and add a Rank column, 1 to however many values are in the table
    VAR __Table = 
        ADDCOLUMNS(
            'Table',
            "Rank",RANKX('Table',[Value])
        )
    // This is the percentage to trim off (20%)
    VAR __Percent = .2
    // Count the number of rows in the table
    VAR __Count = COUNTROWS(__Table)
    // Figure out how many rows to trim off The fractional number of data points to exclude
    // from the calculation. For example, if percent = 0.2, 4 points are trimmed from a data 
    // set of 20 points (20 x 0.2): 2 from the top and 2 from the bottom of the set.
    VAR __Trim = MROUND(__Count * __Percent,2) / 2
    // Get the highest rank
    VAR __MaxRank = MAXX(__Table,[Rank])
    // Get the lowest rank
    VAR __MinRank = MINX(__Table,[Rank])
    // OK if I recall, it was necessary to account for having more than one row with the same
    // rank so some of this gets a little complicated. But, basically, take the virtual 
    // __Table variable and and group by Rank column, adding a Count column for how many
    // rows have that rank and what the value of that rank is. Add cumulative counter columns
    // that simply count how many rows have a higher or lower rank than the current row. In
    // addition, add our while loop counters that take the __Trim variable and subtract our
    // cumulative counters.
    VAR __RanksTable =
        ADDCOLUMNS(
            ADDCOLUMNS(
                GROUPBY(
                    __Table,
                    [Rank],
                    "Count",COUNTX(CURRENTGROUP(),[Value]),
                    "Value",MAXX(CURRENTGROUP(),[Value])
                ),
                "CumulativeBottomCount",COUNTROWS(FILTER(__Table,[Rank] >= EARLIER([Rank]))),
                "CumulativeTopCount",COUNTROWS(FILTER(__Table,[Rank] <= EARLIER([Rank])))
            ),
            "BottomWhile",__Trim - [CumulativeBottomCount],
            "TopWhile",__Trim - [CumulativeTopCount]
        )
    // The minimum bottom row (rank) that we want is the highest rank where the bottom while
    // loop counter is <= 0. So if __Trim is 2 then the bottom while counter goes to 0 at 
    // three up from the lowest rank.
    VAR __MinBottom = MAXX(FILTER(__RanksTable,[BottomWhile]<=0),[BottomWhile])
    VAR __MinTop = MAXX(FILTER(__RanksTable,[TopWhile]<=0),[TopWhile])
    // This table is for the items that we want to exclude (bottom)
    VAR __FinalBottomRankTable = 
        ADDCOLUMNS(
            FILTER(__RanksTable,[BottomWhile]>=__MinBottom),
            "Product",IF([BottomWhile]>=0,[Count]*[Value],([Count] + [BottomWhile]) * [Value])
        )
    // This table is for the items that we want to exclude (top)    
    VAR __FinalTopRankTable =
        ADDCOLUMNS(
            FILTER(__RanksTable,[TopWhile]>=__MinTop),
            "Product",IF([TopWhile]>=0,[Count]*[Value],([Count] + [TopWhile]) * [Value])
        )
    // Add up the values to exclude
    VAR __Bottom = SUMX(__FinalBottomRankTable,[Product])
    VAR __Top = SUMX(__FinalTopRankTable,[Product])
RETURN
    // Determine the average by summing the values in __Table, subtract the bottom and top
    // and then divide by the number of rows minus 2 * the number of rows to trim from top
    // and bottom.
    DIVIDE(
        SUMX(__Table,[Value]) - __Bottom - __Top,
        __Count - 2 * __Trim
    )

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

CL2316 · ‎12-19-2022

Wow thank you so much for this, it is beyond helpful!

I'm struggling with calculating the trimmed mean whilst filtering a table:

VAR __Order =

ADDCOLUMNS(

SUMMARIZE(

CALCULATETABLE( '280CT_order', '280CT_order_detail'[product category] = "Donation"),

'280CT_order'[supporterid],

"giftperhead", CALCULATE( [£ Gross Income],'280CT_order_detail'[product category] = "Donation")) ,

"Rank", ( RANKX('280CT_order', [giftperhead]))

)

I'm trying to filter the Order table to people who have only given a donation and make sure it's 1 row per supporter if they have given multiple donations. Please can you help?

Anonymous · ‎12-26-2021

Appreciate the effort 🙂

Anonymous · ‎07-29-2021

Hi Greg! I'm looking to just trim off the top and bottom 10% of a calculated column. This seems like it's doing a lot more than that. Is there a simpler solution, please? Thank you!

Greg_Deckler · ‎07-29-2021

@Anonymous Nope, that's exactly what this calculation is doing if you set the __Percent variable to .1.

Sorry, TRIMMEAN is one nasty function in DAX...

This thread has some good stuff from @ImkeF: Solved: Replicating TRIMMEAN in DAX - Microsoft Power BI Community

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎07-29-2021

Thank you for your response! I'll give it a shot. What do you recommend if I would need further assistance?

Greg_Deckler · ‎07-29-2021

@Anonymous Just shoot me a note. I designed the formula so that the only thing you should have to mess with is the __Table variable and the __Percent variable. In theory...

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎08-02-2021

Hi Greg,

Thank you! I plugged in your formula and I'm getting closer! However, getting an error related to Column cannot be found or may not be used in this expression. It's on "CumulativeTopCount," "CumulativeBottomCount" and "Value." I have looked up the error, but don't see anything awry. Any ideas, please? Thanks again!

Greg_Deckler · ‎08-02-2021

@Anonymous Oi! You need to use DAX formatter so that I can decipher what is going on in the formula: https://www.daxformatter.com/

At first blush, you seem to be missing an ADDCOLUMNS in your RankTable calculation. The underlined columns can't be used until the ADDCOLUMNS is finalized, which is the reason for the double ADDCOLUMNS logic in RankTable VAR.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎08-02-2021

Hi Greg,

Thanks for letting me know about the DAX Formatter, I had no idea! I added the missing ADDCOLUMNS, but still seeing the same error. I am including the DAX formatter. Please let me know. Thank you!

TRIMMEAN =
VAR __Table =
ADDCOLUMNS (
'Opportunity Product',
"Rank", RANKX ( 'Opportunity Product', [Days in between] )
)
VAR __Percent = .1
VAR __Count =
COUNTROWS ( __Table )
VAR __Trim =
MROUND ( __Count * __Percent, 2 ) / 2
VAR __MaxRank =
MAXX ( __Table, [Rank] )
VAR __MinRank =
MINX ( __Table, [Rank] )
VAR __RankTable =
ADDCOLUMNS (
ADDCOLUMNS (
GROUPBY (
__Table,
[Rank],
"Count", COUNTX ( CURRENTGROUP (), [Days in between] )
),
"Value", MAXX ( CURRENTGROUP (), [Days in between] )
),
"CumulativeBottomCount", COUNTROWS ( FILTER ( __Table, [Rank] >= EARLIER ( [Rank] ) ) ),
"CumulativeTopCount", COUNTROWS ( FILTER ( __Table, [Rank] <= EARLIER ( [Rank] ) ) ),
"BottomWhile", __Trim - [CumulativeBottomCount],
"TopWhile", __Trim - [CumulativeTopCount]
)
VAR __MinBottom =
MAXX ( FILTER ( __RankTable, [BottomWhile] <= 0 ), [BottomWhile] )
VAR __MinTop =
MAXX ( FILTER ( __RankTable, [TopWhile] <= 0 ), [TopWhile] )
VAR __FinalBottomRankTable =
ADDCOLUMNS (
FILTER ( __RankTable, [BottomWhile] >= __MinBottom ),
"Product",
IF (
[BottomWhile] >= 0,
[Count] * [Value],
( [Count] + [BottomWhile] ) * [Value]
)
)
VAR __FinalTopRankTable =
ADDCOLUMNS (
FILTER ( __RankTable, [TopWhile] >= __MinTop ),
"Product",
IF ( [TopWhile] >= 0, [Count] * [Value], ( [Count] + [TopWhile] ) * [Value] )
)
VAR __Bottom =
SUMX ( __FinalBottomRankTable, [Product] )
VAR __Top =
SUMX ( __FinalTopRankTable, [Product] )
RETURN
DIVIDE ( SUMX ( __Table, [Value] ) - __Bottom - __Top, __Count - 2 * __Trim )

Greg_Deckler · ‎08-02-2021

@Anonymous I think you want this:

TRIMMEAN =
VAR __Table =
    ADDCOLUMNS (
        'Opportunity Product',
        "Rank", RANKX ( 'Opportunity Product', [Days in between] )
    )
VAR __Percent = .1
VAR __Count =
    COUNTROWS ( __Table )
VAR __Trim =
    MROUND ( __Count * __Percent, 2 ) / 2
VAR __MaxRank =
    MAXX ( __Table, [Rank] )
VAR __MinRank =
    MINX ( __Table, [Rank] )
VAR __RankTable =
    ADDCOLUMNS (
        ADDCOLUMNS (
            GROUPBY (
                __Table,
                [Rank],
                "Count", COUNTX ( CURRENTGROUP (), [Days in between] ),
                "Value", MAXX ( CURRENTGROUP (), [Days in between] )
            ), //end groupby
            "CumulativeBottomCount", COUNTROWS ( FILTER ( __Table, [Rank] >= EARLIER ( [Rank] ) ) ),
            "CumulativeTopCount", COUNTROWS ( FILTER ( __Table, [Rank] <= EARLIER ( [Rank] ) ) )
        ),//end second ADDCOLUMNS
        "BottomWhile", __Trim - [CumulativeBottomCount],
        "TopWhile", __Trim - [CumulativeTopCount]
    ) //end first ADDCOLUMNS
VAR __MinBottom =
    MAXX ( FILTER ( __RankTable, [BottomWhile] <= 0 ), [BottomWhile] )
VAR __MinTop =
    MAXX ( FILTER ( __RankTable, [TopWhile] <= 0 ), [TopWhile] )
VAR __FinalBottomRankTable =
    ADDCOLUMNS (
        FILTER ( __RankTable, [BottomWhile] >= __MinBottom ),
        "Product",
            IF (
                [BottomWhile] >= 0,
                [Count] * [Value],
                ( [Count] + [BottomWhile] ) * [Value]
            )
    )
VAR __FinalTopRankTable =
    ADDCOLUMNS (
        FILTER ( __RankTable, [TopWhile] >= __MinTop ),
        "Product",
            IF ( [TopWhile] >= 0, [Count] * [Value], ( [Count] + [TopWhile] ) * [Value] )
    )
VAR __Bottom =
    SUMX ( __FinalBottomRankTable, [Product] )
VAR __Top =
    SUMX ( __FinalTopRankTable, [Product] )
RETURN
    DIVIDE ( SUMX ( __Table, [Value] ) - __Bottom - __Top, __Count - 2 * __Trim )

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎08-02-2021

Hi Greg, It looks to be working! No errors and the data looks in alignment! Thank you so very much!!! I have requested your book 🙂 Best, Petra

Greg_Deckler · ‎08-02-2021

@Anonymous Awesome! Glad we could get it working, this was a particulary vexing problem to figure out and I nearly gave up on it like 2 or 3 times!!

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎08-02-2021

I'm glad that you didn't give up! We're grateful to have you!!

cosminc · ‎06-08-2021

Hi Greg,

thanks a lot for your effort, i found this which almost solved a hard request.

my situation is a table like yours but also has one more dimension and i need to put a column in the table with trimmin each row but only for the corespondent item from this dimension. Can you help me please with this?

below is put the new table syntax:

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlHSUcpMMVSK1YlWMkVimyGxzcFsIzDbCIltjMQ2QWKbIrENwWxjJL2GSHqB4rEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "Value"}})
in
#"Renamed Columns"

thanks in advance

Cosmin

ruisilva · ‎05-12-2021

Thank you for your hard work. I was searching for a way to do this and luckily found your post.