Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
wsspglobal
Helper I
Helper I

How to plot box and whisker?

Hi 

 

I am trying to plot the min, max, median, 1st and 2nd quartile in Power BI. I thought a box plot would be the best, so I imported the box and whisker chart by Jan Pieter and the and the violin plot. I am confused about how to use these custom plots. Mainly at the Value setting, as usual for numerical fields I can choose to summarize it as max, min, median, mean etc, but I don't understand why this is the case. I thought the whole purpose of the chart is to generate those at once in the chart. I created a table with measures where I calculated the max, min, median and so on for reference. But the numbers don't match. Please see the snapshots where I used either Sum or Average to plot the charts. The numbers won' match my calculations. Why? How can I fix it?

 

 Box and Whisker:

'Sum' applied.

Sum.jpg

Average applied:

 

Average.jpg

 

Violin Chart:

Sum applied:

sum- ioa.jpg

Average applied:

Average vio.jpg

 

2 ACCEPTED SOLUTIONS
dax
Community Support
Community Support

Hi wsspglobal

According to your description, it seems that you want to understand the logic of above chart, right?

For whisker, you could find that it group on category, then calculate  the max of sampling

91.png

For plot, you could find that it also  group on “Category”, and calculate max, min based on “sampling”

92.png

So you could modify field in “Category ” and “sample”  based on your requirement.

 

Best Regards,

Zoe Zhi

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Hi @wsspglobal - I just came across this post while browsing - I'm the author of the violin plot visual and may be able to help clarify its usage, if you're still experiencing some challenges.

This is documented here, but I'll walk through using your data.

You are correct that the visual should create all statistical measures, based on the data you feed into it. In its simplest terms, you need to be able to provide data at its lowest possible grain for this to be worked out correctly. If any of this data is pre-aggregated then it will not work.

From your example, you are plotting Session as a Category and Amount as the Measure Data. If we only pass these in the the visual, then Power BI will aggregate Amount by Session - as you've already indicated in your screenshot - but to illustrate how this looks to the visual:

image.png

You will notice that the Amount is aggregated by the Session. In order for the visual to know more about the distribution of your data, we need to give it every row we can and this is where the Sampling field comes in.

You need to create a unique value for this field so that the visual does not aggregate the measure at a higher level. We'll look at your data and try and find a suitable candidate field.

An easy way to check this is by adding a measure that provides a count for the one you want to check. Let's have a look at Subset:

image.png

Note that there is a 2 in each column. This indicates that Subset is too high-level to provide all values of Amount to the visual, and we can see the resulting violin plot:

image.png

The tooltip says there's 5 samples, because there are 5 distinct values for Subset, and we actually need 10 because that;s what's in your raw data.

An ideal way to do this is to use something like a primary key in your table to do the sampling on. The puts the data at its lowest level of grain. You can do this in Power Query, e.g.:

  1. Add Column
  2. Index Column
  3. From 1

image.png

This creates a unique value for every row in your table, e.g.:

image.png

Close & Apply the query and this will get added to your table. Let's repeat the above 'Count Test' using Index:

image.png

We can now see that every row has a 1 in it, which means that for every row, the Amount is going to be as per the table and won't pre-aggregate, e.g.:

image.png

Now, we should add Index to the violin plot so replicate this behaviour, e.g.:

image.png

If we check the tooltip for A, we get the stats calculated, e.g.:

image.png

This technique will also work for the Box & Whisker Chart, e.g.:

image.png

Hopefully this helps clarify the usage and behaviour of both of these visuals, but feel free to ask anything else and I'll do my best to answer.

Good luck!

Daniel





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!


My course: Introduction to Developing Power BI Visuals


On how to ask a technical question, if you really want an answer (courtesy of SQLBI)




View solution in original post

5 REPLIES 5
benmcmahon
Helper II
Helper II

Hi @dm-p,

 

I have stumbled accross this post and i found it extremely helpful for using the box and whisker plot shown above.

 

However, I am having a problem where the mean and median calculated with the built in agregation functions are not the same as the mean and median in the box and whisker chart.Screenshot 2022-11-02 111208.png

 Here we can see that the aggregations I have applied to "EPGcalculation" do not equal the same as those in the box and whisker chart.

Can you tell why this is happening?

 

Here is the link to mybpix file. all the data is from "bireportdata PhenRQCPlusData" and the visuals are on "Page 1".

https://techiongrouplimited-my.sharepoint.com/:u:/g/personal/ben_techiongroup_co_nz/EQw8qurSCo9Oo0SR...

Thanks in advance

wsspglobal
Helper I
Helper I

Here's my data:

 

SessionSubsetYearAmount
Aa201181.82676
Ab201138.07561
Ac201157.11986
Ad201134.58256
Ae201158.59327
Ba201117.83215
Bb201176.71251
Bc201123.25766
Bd20115.124324
Be201155.95021
Ca201112.54243
Cb201127.4061
Cc201172.05263
Cd201165.40645
Ce201199.42319
Da201188.69064
Db201197.62212
Dc20114.266029
Dd201143.47741
De201167.51566
Ea201126.86167
Eb20114.488669
Ec201125.73854
Ed201176.41257
Ee201120.14493
Fa201140.40302
Fb201185.39068
Fc201140.93762
Fd201142.4243
Fe201132.9475
Aa201266.79593
Ab201293.1857
Ac201258.68907
Ad201242.2477
Ae201277.47958
Ba201275.50514
Bb201270.82896
Bc201256.12918
Bd201210.55485
Be201276.93667
Ca201218.18667
Cb201263.1883
Cc201293.8165
Cd201281.05533
Ce201275.86873
Da201274.05272
Db201242.40564
Dc201230.16113
Dd201257.68567
De201240.86473
Ea201242.71101
Eb201265.83886
Ec20128.514872
Ed201219.7125
Ee201224.29333
Fa201271.20736
Fb201280.1227
Fc201286.54642
Fd20129.776906
Fe201277.82144
dax
Community Support
Community Support

Hi wsspglobal

According to your description, it seems that you want to understand the logic of above chart, right?

For whisker, you could find that it group on category, then calculate  the max of sampling

91.png

For plot, you could find that it also  group on “Category”, and calculate max, min based on “sampling”

92.png

So you could modify field in “Category ” and “sample”  based on your requirement.

 

Best Regards,

Zoe Zhi

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @wsspglobal - I just came across this post while browsing - I'm the author of the violin plot visual and may be able to help clarify its usage, if you're still experiencing some challenges.

This is documented here, but I'll walk through using your data.

You are correct that the visual should create all statistical measures, based on the data you feed into it. In its simplest terms, you need to be able to provide data at its lowest possible grain for this to be worked out correctly. If any of this data is pre-aggregated then it will not work.

From your example, you are plotting Session as a Category and Amount as the Measure Data. If we only pass these in the the visual, then Power BI will aggregate Amount by Session - as you've already indicated in your screenshot - but to illustrate how this looks to the visual:

image.png

You will notice that the Amount is aggregated by the Session. In order for the visual to know more about the distribution of your data, we need to give it every row we can and this is where the Sampling field comes in.

You need to create a unique value for this field so that the visual does not aggregate the measure at a higher level. We'll look at your data and try and find a suitable candidate field.

An easy way to check this is by adding a measure that provides a count for the one you want to check. Let's have a look at Subset:

image.png

Note that there is a 2 in each column. This indicates that Subset is too high-level to provide all values of Amount to the visual, and we can see the resulting violin plot:

image.png

The tooltip says there's 5 samples, because there are 5 distinct values for Subset, and we actually need 10 because that;s what's in your raw data.

An ideal way to do this is to use something like a primary key in your table to do the sampling on. The puts the data at its lowest level of grain. You can do this in Power Query, e.g.:

  1. Add Column
  2. Index Column
  3. From 1

image.png

This creates a unique value for every row in your table, e.g.:

image.png

Close & Apply the query and this will get added to your table. Let's repeat the above 'Count Test' using Index:

image.png

We can now see that every row has a 1 in it, which means that for every row, the Amount is going to be as per the table and won't pre-aggregate, e.g.:

image.png

Now, we should add Index to the violin plot so replicate this behaviour, e.g.:

image.png

If we check the tooltip for A, we get the stats calculated, e.g.:

image.png

This technique will also work for the Box & Whisker Chart, e.g.:

image.png

Hopefully this helps clarify the usage and behaviour of both of these visuals, but feel free to ask anything else and I'll do my best to answer.

Good luck!

Daniel





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!


My course: Introduction to Developing Power BI Visuals


On how to ask a technical question, if you really want an answer (courtesy of SQLBI)




bBI
Frequent Visitor

This technique will also work for the Box & Whisker Chart, e.g.:

image.png

 

Adding an index in the sample field helped a bit.

 

I say a bit because out of my 5 years worth of data, for 3 years the box & whiskers match my manually calculated reference table but the other 2 are off.

 

One year completely vanishes while the other shows slight discrepancies.

 

Apparently this is due to there being too many index values and I'm hitting a sampling limit.

 

Filtering my category (years) to be only the two erroneous years fixes them (the box plot values match my reference table) however that's not really a good solution (as I'm trying to visualize a trend over time so having only 2 or 3 years to work with gives for a very limited historical overview)

 

 

Does anybody know how to bypass the index limit?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.