Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.

Reply
Bart_Berg
Frequent Visitor

Group by and count in scatter plot

Hi guys,
 
I really hope someone is able to shed some light on this challenge I am tasked with.
 
I am trying to build a scatter plot to visualize some data.
 
My data looks as follows:
Sessions:
date
user_id
content_id
id
Date
user_id
content_id
id
2021-10-14
5
1
1
2021-10-14
8
1
2
2021-10-14
8
1
3
2021-10-14
8
6
4
 
Now, I need to build a scatter plot that visualizes the different amount of content that is consumed in a month by users. So, in the table above we can see that user 5 consumes 1 unique content item this month and user 8 consumes 2 unique content items.
x-axis - count of id (so that is the session count).
y-axis - the count of unique content_ids per user
size - distinct count of user_id.
 
I have absolutely no idea how I can group the data on the y-axis to visualize this.. In this example I would expect two bubbles in my chart. 1 bubble with an y-axis value of 1, and one bubble with an y-axis value of 2.
I hope anyone can point me in the right direction :)!
 
1 ACCEPTED SOLUTION

You'd need to create a table that contains all possible sessions per user.

 

Something like this:

X-Axis = 
VAR MaxSessions = MAXX ( ALL ( Sessions[user_id] ), CALCULATE ( COUNT ( Sessions[id] ) ) )
RETURN
    SELECTCOLUMNS ( GENERATESERIES ( 0, MaxSessions + 1 ), "Sessions", [Value] )

 

The measures have to be a bit more complicated.

AvgUniqueContent = 
VAR CurrSessions = SELECTEDVALUE ( 'X-Axis'[Sessions] )
VAR Summary =
    SUMMARIZE (
        Sessions,
        Sessions[user_id],
        "@UniqueContent", DISTINCTCOUNT ( Sessions[content_id] ),
        "@Sessions", COUNT ( Sessions[id] )
    )
RETURN
    AVERAGEX ( FILTER ( Summary, [@Sessions] = CurrSessions ), [@UniqueContent] )

 

The user count can be defined similarly (just use COUNTROWS instead of AVERAGEX in the last line).

 

View solution in original post

6 REPLIES 6
mahoneypat
Microsoft Employee
Microsoft Employee

This should be possible without a calculated table.  Please clarify what each bubble represents.  Is each one a User?  a month?  You gave the expect Y axis values.   What are the expected values for the X axis and bubble size?  I thought each bubble was a user until I saw that size was distinct count of users.

 

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


X-axis is the session count for a user. Size is the number of users with this session count. If you have three users and one has 3 sessions and the other two have 5 sessions each, then you get a bubble with size one at x=3 and a bubble with size two at x=5.

 

I'd be super interested if you can find a way to do this dynamically without setting up a table to use for the x-axis.

AlexisOlson
Super User
Super User

This is easier if you create a calculated summary table like this:

AlexisOlson_0-1634242826375.png

 

From there, it's drag-and-drop to put the fields into a scatterplot.

AlexisOlson_2-1634243085845.png

 

If it needs to be more dynamic (where a calculated table doesn't work), you'll still need to create an extra table to use for the x-axis.

Thanks a lot for your detailed answer, it's very much appreciated! This does work indeed, but unfortunately I need to be able to filter on month even and even on a category which is in another dimension linked by content_id. What would my table structure need to look like to make that work?

 

Thanks :)!

You'd need to create a table that contains all possible sessions per user.

 

Something like this:

X-Axis = 
VAR MaxSessions = MAXX ( ALL ( Sessions[user_id] ), CALCULATE ( COUNT ( Sessions[id] ) ) )
RETURN
    SELECTCOLUMNS ( GENERATESERIES ( 0, MaxSessions + 1 ), "Sessions", [Value] )

 

The measures have to be a bit more complicated.

AvgUniqueContent = 
VAR CurrSessions = SELECTEDVALUE ( 'X-Axis'[Sessions] )
VAR Summary =
    SUMMARIZE (
        Sessions,
        Sessions[user_id],
        "@UniqueContent", DISTINCTCOUNT ( Sessions[content_id] ),
        "@Sessions", COUNT ( Sessions[id] )
    )
RETURN
    AVERAGEX ( FILTER ( Summary, [@Sessions] = CurrSessions ), [@UniqueContent] )

 

The user count can be defined similarly (just use COUNTROWS instead of AVERAGEX in the last line).

 

That's awesome, thanks again for the detailed explanation. This works great!!

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.