Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
learning_dax
Helper II
Helper II

Box & Whisker Plot (Basic Statistics & Mean Calculation)

Hi all, 

 

I work with Power BI & I am calculating average days to accomplish a task with a Box & Whisker plot. I noticed that it, by default, only captures unique values. However, some individuals of course take the same amount of days to complete a task (see attached picture). I am trying to do outlier & mean analysis for many individuals and how long they take to complete a task on average.

 

This now has me wondering is mean based on unique values only for true average hence why box & whisker plots don't include duplicate values? In this case, the calculation would only include unique values, but then does that change the "n" you divide by to all values in the sample or just unique ones?

 

Or should the duplicates still be accounted for in calculating the average days it takes to accomplish a task. I am not very well versed in statistics so anything would help in calculating average/mean.

 

The main concern here is when I put the "Days to Accomplish Task" average in tabular format it is giving different values than the mean in the Box & Whisker plot and I believe it is because it is not accounting for duplicate values when calculating the mean. Any suggestions/thoughts? 


excel_ex.png

1 ACCEPTED SOLUTION
TomMartens
Super User
Super User

Hey @learning_dax ,

 

the mean/average is a beast, not just in Power BI or DAX, but basically everywhere 🙂

 

Nevertheless, you stumbled on something that you (meaning we) have to be aware of, whenever we throw data to data visualizations. There is a default behavior that we can not change, this behavior is the grouping of rows. Assume we have this simple dataset:

image.png

With a given aggregation function of SUM for the column amount:

image.png

When we drag the columns to a table visual everything is behaving as expected:

image.png

 

 

 

We see the value 4 on the category this.

When we switch the aggregation to mean we see a value of two, the next screenshot shows this (I dragged the numeric column to the visual a 2nd time), it's important to understand that the mean is calculated by SUM( ... ) / NumberOfRows.
Next to that the screenshot also shows the "Box and Whisker by MAQ Software" visual, and now we see the different means:

image.png

Basically, this issue can not be resolved due to the default behavior of Power BI: grouping of rows, the number of rows is not available to the visuals.

Most of the time, this is exactly what we want, but from now and then this behavior creates an issue.

You can change this either by changing the default aggregation of the numeric column, from SUM to AVERAGE (of course this is not an option if we are using an explicit measure), this will lead to the same MEAN in the Box and Whisker visual and the table visual, or add a rowidentifier to the dataset:

image.png

Using this in combination with the Box and Whisker plot yields the same MEAN as the table visual:

image.png

 

Hopefully, this helps to tackle your challenge and provides some insights on what's going on.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

3 REPLIES 3
TomMartens
Super User
Super User

Hey @learning_dax ,

 

the mean/average is a beast, not just in Power BI or DAX, but basically everywhere 🙂

 

Nevertheless, you stumbled on something that you (meaning we) have to be aware of, whenever we throw data to data visualizations. There is a default behavior that we can not change, this behavior is the grouping of rows. Assume we have this simple dataset:

image.png

With a given aggregation function of SUM for the column amount:

image.png

When we drag the columns to a table visual everything is behaving as expected:

image.png

 

 

 

We see the value 4 on the category this.

When we switch the aggregation to mean we see a value of two, the next screenshot shows this (I dragged the numeric column to the visual a 2nd time), it's important to understand that the mean is calculated by SUM( ... ) / NumberOfRows.
Next to that the screenshot also shows the "Box and Whisker by MAQ Software" visual, and now we see the different means:

image.png

Basically, this issue can not be resolved due to the default behavior of Power BI: grouping of rows, the number of rows is not available to the visuals.

Most of the time, this is exactly what we want, but from now and then this behavior creates an issue.

You can change this either by changing the default aggregation of the numeric column, from SUM to AVERAGE (of course this is not an option if we are using an explicit measure), this will lead to the same MEAN in the Box and Whisker visual and the table visual, or add a rowidentifier to the dataset:

image.png

Using this in combination with the Box and Whisker plot yields the same MEAN as the table visual:

image.png

 

Hopefully, this helps to tackle your challenge and provides some insights on what's going on.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

@TomMartens this explanation is exactly what I was inquiring about and it makes sense as to why it behaves that way. The "row identifier" solution will work, but do you mind sharing the formula for the measure or calculated column that may work? 

I have tried a couple different methods but again duplicate values pose another challenge when I index columns when we need all unique row identifiers. Thanks for all of your help. 

Hey @learning_dax ,

 

there is not DAX solution, as the number of rows that will be passed to the visuals will not change. What I was trying to say was this, if you pass a measure to the visual, then you have to use a row identifier. If you use a numeric column then you can maybe overcome the issue by changing the default aggregation function to AVERAGE.

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.