Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
JimmyCruyp
Helper II
Helper II

Double box plot

Hello, I am trying to find a graph or a solution to have a double box plot. See picture. Anyone?

Thanks a lot,

JimCapture d’écran 2023-02-16 172923.png

1 ACCEPTED SOLUTION
dm-p
Super User
Super User

Hi @JimmyCruyp, and thanks for such an interesting question! I'm not aware of anything, so I have had a go at building using Deneb with Vega-Lite and the iris dataset, which I hope proves the concept:

 

dmp_0-1677015649384.png

While Vega-Lite has a boxplot mark for standard box plots, I needed to calculate the statistics and draw this from primitive marks, so the specification is quite long. I've attached a workbook showing this in action, and I'll list the JSON specification lower down.

Note that because Power BI aggregates measures, you will need to use an index row to ensure that you get all rows in your dataset as intended. In the workbook, this is the Row Number field, and is a sequential field.

If you're not familiar with this concept, I have a blog post here about it, which may help.

Note that this solution is provided as-is, as it took me a few hours to work out, so don't have any time available for further support. If you want to tweak this design, then it's best to have a look at the Vega-Lite documentation to see how the language works. Because Deneb visuals are certified you get many of the advantages you don't get with R and Python visuals, such as full compatibility with publish to web, and PowerPoint & PDF exports.

Cheers,

Daniel

 

JSON Specification (as per workbook):

 

{
  "data": {"name": "dataset"},
  "transform": [
    {
      "joinaggregate": [
        {
          "op": "count",
          "field": "Row Number",
          "as": "count"
        },
        {
          "op": "q1",
          "field": "Sepal Length",
          "as": "x_q1"
        },
        {
          "op": "q3",
          "field": "Sepal Length",
          "as": "x_q3"
        },
        {
          "op": "median",
          "field": "Sepal Length",
          "as": "x_median"
        },
        {
          "op": "min",
          "field": "Sepal Length",
          "as": "x_min"
        },
        {
          "op": "max",
          "field": "Sepal Length",
          "as": "x_max"
        },
        {
          "op": "q1",
          "field": "Sepal Width",
          "as": "y_q1"
        },
        {
          "op": "q3",
          "field": "Sepal Width",
          "as": "y_q3"
        },
        {
          "op": "median",
          "field": "Sepal Width",
          "as": "y_median"
        },
        {
          "op": "min",
          "field": "Sepal Width",
          "as": "y_min"
        },
        {
          "op": "max",
          "field": "Sepal Width",
          "as": "y_max"
        }
      ],
      "groupby": ["Species"]
    },
    {
      "calculate": "datum['x_q3'] - datum['x_q1']",
      "as": "x_iqr"
    },
    {
      "calculate": "datum['y_q3'] - datum['y_q1']",
      "as": "y_iqr"
    },
    {
      "calculate": "datum['x_q1'] - 1.5 * datum['x_iqr']",
      "as": "x_lower"
    },
    {
      "calculate": "datum['x_q3'] + 1.5 * datum['x_iqr']",
      "as": "x_upper"
    },
    {
      "calculate": "datum['y_q1'] - 1.5 * datum['y_iqr']",
      "as": "y_lower"
    },
    {
      "calculate": "datum['y_q3'] + 1.5 * datum['y_iqr']",
      "as": "y_upper"
    },
    {
      "calculate": "datum['Sepal Length'] < datum['x_lower'] || datum['Sepal Length'] > datum['x_upper']",
      "as": "x_outlier"
    },
    {
      "calculate": "datum['Sepal Width'] < datum['y_lower'] || datum['Sepal Width'] > datum['y_upper']",
      "as": "y_outlier"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_min'], '#0.0') + ', y=' + pbiFormat(datum['y_min'], '#0.0')",
      "as": "tooltip_min"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_q1'], '#0.0') + ', y=' + pbiFormat(datum['y_q1'], '#0.0')",
      "as": "tooltip_q1"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_median'], '#0.0') + ', y=' + pbiFormat(datum['y_median'], '#0.0')",
      "as": "tooltip_median"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_q3'], '#0.0') + ', y=' + pbiFormat(datum['y_q3'], '#0.0')",
      "as": "tooltip_q3"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_max'], '#0.0') + ', y=' + pbiFormat(datum['y_max'], '#0.0')",
      "as": "tooltip_max"
    }
  ],
  "layer": [
    {
      "description": "Box plot portion; aggregate our data so that we only get the right number of marks per category.",
      "transform": [
        {
          "aggregate": [],
          "groupby": [
            "Species",
            "count",
            "x_min",
            "x_lower",
            "x_q1",
            "x_median",
            "x_q3",
            "x_upper",
            "x_max",
            "y_min",
            "y_lower",
            "y_q1",
            "y_median",
            "y_q3",
            "y_upper",
            "y_max",
            "tooltip_min",
            "tooltip_q1",
            "tooltip_median",
            "tooltip_q3",
            "tooltip_max"
          ]
        }
      ],
      "layer": [
        {
          "description": "Box (drawn with a rectangle and the x/y quartiles.",
          "mark": {
            "type": "rect",
            "tooltip": true,
            "fillOpacity": 0.5
          },
          "encoding": {
            "x": {"field": "x_q1"},
            "x2": {"field": "x_q3"},
            "y": {"field": "y_q1"},
            "y2": {"field": "y_q3"},
            "stroke": {
              "field": "Species",
              "legend": null
            }
          }
        },
        {
          "description": "x whisker (runs from x lower to x upper and positioned vertically by y median).",
          "mark": {"type": "rule"},
          "encoding": {
            "x": {"field": "x_lower"},
            "x2": {"field": "x_upper"},
            "y": {"field": "y_median"}
          }
        },
        {
          "description": "x lower and upper rules",
          "encoding": {
            "y": {"field": "y_q3"},
            "y2": {"field": "y_q1"}
          },
          "layer": [
            {
              "description": "x lower value",
              "mark": {"type": "rule"},
              "encoding": {
                "x": {
                  "field": "x_lower"
                }
              }
            },
            {
              "description": "x upper value",
              "mark": {"type": "rule"},
              "encoding": {
                "x": {
                  "field": "x_upper"
                }
              }
            }
          ]
        },
        {
          "description": "y whisker (runs from y lower to y upper and positioned vertically by x median).",
          "mark": {"type": "rule"},
          "encoding": {
            "y": {"field": "y_lower"},
            "y2": {"field": "y_upper"},
            "x": {"field": "x_median"}
          }
        },
        {
          "description": "y lower and upper rules",
          "encoding": {
            "x": {"field": "x_q3"},
            "x2": {"field": "x_q1"}
          },
          "layer": [
            {
              "description": "y lower value",
              "mark": {"type": "rule"},
              "encoding": {
                "y": {
                  "field": "y_lower"
                }
              }
            },
            {
              "description": "y upper value",
              "mark": {"type": "rule"},
              "encoding": {
                "y": {
                  "field": "y_upper"
                }
              }
            }
          ]
        }
      ]
    },
    {
      "description": "outlier values.",
      "encoding": {
        "tooltip": [
          {"field": "Species"},
          {"field": "Sepal Width"},
          {"field": "Sepal Length"}
        ]
      },
      "layer": [
        {
          "description": "x outlier values.",
          "transform": [
            {
              "filter": "datum['x_outlier']"
            }
          ],
          "mark": {
            "type": "point",
            "tooltip": true,
            "filled": false
          },
          "encoding": {
            "x": {
              "field": "Sepal Length"
            },
            "y": {"field": "y_median"}
          }
        },
        {
          "description": "y outlier values.",
          "transform": [
            {
              "filter": "datum['y_outlier']"
            }
          ],
          "mark": {
            "type": "point",
            "filled": false,
            "tooltip": true
          },
          "encoding": {
            "y": {
              "field": "Sepal Width"
            },
            "x": {"field": "x_median"}
          }
        }
      ]
    }
  ],
  "encoding": {
    "x": {
      "type": "quantitative",
      "scale": {"zero": false},
      "axis": {"title": "Sepal Length"}
    },
    "y": {
      "type": "quantitative",
      "scale": {"zero": false},
      "axis": {"title": "Sepal Width"}
    },
    "color": {"field": "Species"},
    "tooltip": [
      {"field": "Species"},
      {
        "field": "tooltip_min",
        "title": "Minimum Value"
      },
      {
        "field": "tooltip_q1",
        "title": "Lower Quartile"
      },
      {
        "field": "tooltip_median",
        "title": "Median"
      },
      {
        "field": "tooltip_q3",
        "title": "Upper Quartile"
      },
      {
        "field": "tooltip_max",
        "title": "Maximum Value"
      },
      {
        "field": "count",
        "title": "# Samples"
      }
    ]
  }
}

 

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!


My course: Introduction to Developing Power BI Visuals


On how to ask a technical question, if you really want an answer (courtesy of SQLBI)




View solution in original post

6 REPLIES 6
JimmyCruyp
Helper II
Helper II

TOP! Merci beaucoup!!

dm-p
Super User
Super User

Hi @JimmyCruyp, and thanks for such an interesting question! I'm not aware of anything, so I have had a go at building using Deneb with Vega-Lite and the iris dataset, which I hope proves the concept:

 

dmp_0-1677015649384.png

While Vega-Lite has a boxplot mark for standard box plots, I needed to calculate the statistics and draw this from primitive marks, so the specification is quite long. I've attached a workbook showing this in action, and I'll list the JSON specification lower down.

Note that because Power BI aggregates measures, you will need to use an index row to ensure that you get all rows in your dataset as intended. In the workbook, this is the Row Number field, and is a sequential field.

If you're not familiar with this concept, I have a blog post here about it, which may help.

Note that this solution is provided as-is, as it took me a few hours to work out, so don't have any time available for further support. If you want to tweak this design, then it's best to have a look at the Vega-Lite documentation to see how the language works. Because Deneb visuals are certified you get many of the advantages you don't get with R and Python visuals, such as full compatibility with publish to web, and PowerPoint & PDF exports.

Cheers,

Daniel

 

JSON Specification (as per workbook):

 

{
  "data": {"name": "dataset"},
  "transform": [
    {
      "joinaggregate": [
        {
          "op": "count",
          "field": "Row Number",
          "as": "count"
        },
        {
          "op": "q1",
          "field": "Sepal Length",
          "as": "x_q1"
        },
        {
          "op": "q3",
          "field": "Sepal Length",
          "as": "x_q3"
        },
        {
          "op": "median",
          "field": "Sepal Length",
          "as": "x_median"
        },
        {
          "op": "min",
          "field": "Sepal Length",
          "as": "x_min"
        },
        {
          "op": "max",
          "field": "Sepal Length",
          "as": "x_max"
        },
        {
          "op": "q1",
          "field": "Sepal Width",
          "as": "y_q1"
        },
        {
          "op": "q3",
          "field": "Sepal Width",
          "as": "y_q3"
        },
        {
          "op": "median",
          "field": "Sepal Width",
          "as": "y_median"
        },
        {
          "op": "min",
          "field": "Sepal Width",
          "as": "y_min"
        },
        {
          "op": "max",
          "field": "Sepal Width",
          "as": "y_max"
        }
      ],
      "groupby": ["Species"]
    },
    {
      "calculate": "datum['x_q3'] - datum['x_q1']",
      "as": "x_iqr"
    },
    {
      "calculate": "datum['y_q3'] - datum['y_q1']",
      "as": "y_iqr"
    },
    {
      "calculate": "datum['x_q1'] - 1.5 * datum['x_iqr']",
      "as": "x_lower"
    },
    {
      "calculate": "datum['x_q3'] + 1.5 * datum['x_iqr']",
      "as": "x_upper"
    },
    {
      "calculate": "datum['y_q1'] - 1.5 * datum['y_iqr']",
      "as": "y_lower"
    },
    {
      "calculate": "datum['y_q3'] + 1.5 * datum['y_iqr']",
      "as": "y_upper"
    },
    {
      "calculate": "datum['Sepal Length'] < datum['x_lower'] || datum['Sepal Length'] > datum['x_upper']",
      "as": "x_outlier"
    },
    {
      "calculate": "datum['Sepal Width'] < datum['y_lower'] || datum['Sepal Width'] > datum['y_upper']",
      "as": "y_outlier"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_min'], '#0.0') + ', y=' + pbiFormat(datum['y_min'], '#0.0')",
      "as": "tooltip_min"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_q1'], '#0.0') + ', y=' + pbiFormat(datum['y_q1'], '#0.0')",
      "as": "tooltip_q1"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_median'], '#0.0') + ', y=' + pbiFormat(datum['y_median'], '#0.0')",
      "as": "tooltip_median"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_q3'], '#0.0') + ', y=' + pbiFormat(datum['y_q3'], '#0.0')",
      "as": "tooltip_q3"
    },
    {
      "calculate": "'x=' + pbiFormat(datum['x_max'], '#0.0') + ', y=' + pbiFormat(datum['y_max'], '#0.0')",
      "as": "tooltip_max"
    }
  ],
  "layer": [
    {
      "description": "Box plot portion; aggregate our data so that we only get the right number of marks per category.",
      "transform": [
        {
          "aggregate": [],
          "groupby": [
            "Species",
            "count",
            "x_min",
            "x_lower",
            "x_q1",
            "x_median",
            "x_q3",
            "x_upper",
            "x_max",
            "y_min",
            "y_lower",
            "y_q1",
            "y_median",
            "y_q3",
            "y_upper",
            "y_max",
            "tooltip_min",
            "tooltip_q1",
            "tooltip_median",
            "tooltip_q3",
            "tooltip_max"
          ]
        }
      ],
      "layer": [
        {
          "description": "Box (drawn with a rectangle and the x/y quartiles.",
          "mark": {
            "type": "rect",
            "tooltip": true,
            "fillOpacity": 0.5
          },
          "encoding": {
            "x": {"field": "x_q1"},
            "x2": {"field": "x_q3"},
            "y": {"field": "y_q1"},
            "y2": {"field": "y_q3"},
            "stroke": {
              "field": "Species",
              "legend": null
            }
          }
        },
        {
          "description": "x whisker (runs from x lower to x upper and positioned vertically by y median).",
          "mark": {"type": "rule"},
          "encoding": {
            "x": {"field": "x_lower"},
            "x2": {"field": "x_upper"},
            "y": {"field": "y_median"}
          }
        },
        {
          "description": "x lower and upper rules",
          "encoding": {
            "y": {"field": "y_q3"},
            "y2": {"field": "y_q1"}
          },
          "layer": [
            {
              "description": "x lower value",
              "mark": {"type": "rule"},
              "encoding": {
                "x": {
                  "field": "x_lower"
                }
              }
            },
            {
              "description": "x upper value",
              "mark": {"type": "rule"},
              "encoding": {
                "x": {
                  "field": "x_upper"
                }
              }
            }
          ]
        },
        {
          "description": "y whisker (runs from y lower to y upper and positioned vertically by x median).",
          "mark": {"type": "rule"},
          "encoding": {
            "y": {"field": "y_lower"},
            "y2": {"field": "y_upper"},
            "x": {"field": "x_median"}
          }
        },
        {
          "description": "y lower and upper rules",
          "encoding": {
            "x": {"field": "x_q3"},
            "x2": {"field": "x_q1"}
          },
          "layer": [
            {
              "description": "y lower value",
              "mark": {"type": "rule"},
              "encoding": {
                "y": {
                  "field": "y_lower"
                }
              }
            },
            {
              "description": "y upper value",
              "mark": {"type": "rule"},
              "encoding": {
                "y": {
                  "field": "y_upper"
                }
              }
            }
          ]
        }
      ]
    },
    {
      "description": "outlier values.",
      "encoding": {
        "tooltip": [
          {"field": "Species"},
          {"field": "Sepal Width"},
          {"field": "Sepal Length"}
        ]
      },
      "layer": [
        {
          "description": "x outlier values.",
          "transform": [
            {
              "filter": "datum['x_outlier']"
            }
          ],
          "mark": {
            "type": "point",
            "tooltip": true,
            "filled": false
          },
          "encoding": {
            "x": {
              "field": "Sepal Length"
            },
            "y": {"field": "y_median"}
          }
        },
        {
          "description": "y outlier values.",
          "transform": [
            {
              "filter": "datum['y_outlier']"
            }
          ],
          "mark": {
            "type": "point",
            "filled": false,
            "tooltip": true
          },
          "encoding": {
            "y": {
              "field": "Sepal Width"
            },
            "x": {"field": "x_median"}
          }
        }
      ]
    }
  ],
  "encoding": {
    "x": {
      "type": "quantitative",
      "scale": {"zero": false},
      "axis": {"title": "Sepal Length"}
    },
    "y": {
      "type": "quantitative",
      "scale": {"zero": false},
      "axis": {"title": "Sepal Width"}
    },
    "color": {"field": "Species"},
    "tooltip": [
      {"field": "Species"},
      {
        "field": "tooltip_min",
        "title": "Minimum Value"
      },
      {
        "field": "tooltip_q1",
        "title": "Lower Quartile"
      },
      {
        "field": "tooltip_median",
        "title": "Median"
      },
      {
        "field": "tooltip_q3",
        "title": "Upper Quartile"
      },
      {
        "field": "tooltip_max",
        "title": "Maximum Value"
      },
      {
        "field": "count",
        "title": "# Samples"
      }
    ]
  }
}

 

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!


My course: Introduction to Developing Power BI Visuals


On how to ask a technical question, if you really want an answer (courtesy of SQLBI)




 @giammariam Hello to you. I had made a request to have this kind of visual. The graph seems to correspond completely to what I asked for. but when I go to see in the code, there is much too specific information that cannot answer a standard request according to what I understand. 1. I have different tables that must go in this graph. Here I see that it is indicated: "dataset". Is that a problem? 2. I see "datum" : is that a constraint? 3. I also see that we are talking about Q1, Q3. I dont understand if I need this type of data. To be more clear, is it possible to have a much more generic code. Standard? I am not sure to understand how i can use it with my data in fact. this is not clear for me... and also if I have twenty box plots, will the size change? And if the size doesn't change, is it possible to add the size adjustment bar that you created for the other graphs? Thank you very much. This one is super important to me.

Hey Jim. While I certainly appreciate the tag, I'm a bit confused. Did you mean to tag @dm-p in this one? Seeing as he was the author of the solution (not to mention the creator of Deneb), he is far more fit to help you with this than I am. If I'm missing something and you still need support, let me know.



Madison Giammaria
Proud to be a Super User 😄
LinkedIn

Do you frequently use Deneb to provide insights to your stakeholders? Have you considered sponsoring this free and open source custom visual? More info here!

Hello Giammaria, sorry for appealing to you... I know that you are not the author of the graph, but the creator of the graph specified that he did not have more time to grant this. I thought that this new visualization might interest you and that you could possibly bring it to you. Especially because I can't use it. It may be more standard yielding but also adding a possibility to enlarge or reduce the box plot. Can you help me on this? This visual is very important to me... I thanked in advance,

No problem in requesting my help, Jim, happy to help! I just didn't know if it was intentional or not, until you mentioned that Daniel no longer has bandwidth to assist with this. Now it all makes sense. I definitely would like to try to dive into this. Unfortunately, I have some other things on my list currently that take priority. Maybe consider creating a new post (and tagging me so I can support eventually). This way more people are likely to see it and may be able to help before I can get to it since that post will not already be marked as "solved."



Madison Giammaria
Proud to be a Super User 😄
LinkedIn

Do you frequently use Deneb to provide insights to your stakeholders? Have you considered sponsoring this free and open source custom visual? More info here!

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors