Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Bart_Berg
Frequent Visitor

Group by and count in scatter plot

Hi guys,
 
I really hope someone is able to shed some light on this challenge I am tasked with.
 
I am trying to build a scatter plot to visualize some data.
 
My data looks as follows:
Sessions:
date
user_id
content_id
id
Date
user_id
content_id
id
2021-10-14
5
1
1
2021-10-14
8
1
2
2021-10-14
8
1
3
2021-10-14
8
6
4
 
Now, I need to build a scatter plot that visualizes the different amount of content that is consumed in a month by users. So, in the table above we can see that user 5 consumes 1 unique content item this month and user 8 consumes 2 unique content items.
x-axis - count of id (so that is the session count).
y-axis - the count of unique content_ids per user
size - distinct count of user_id.
 
I have absolutely no idea how I can group the data on the y-axis to visualize this.. In this example I would expect two bubbles in my chart. 1 bubble with an y-axis value of 1, and one bubble with an y-axis value of 2.
I hope anyone can point me in the right direction :)!
 
1 ACCEPTED SOLUTION

You'd need to create a table that contains all possible sessions per user.

 

Something like this:

X-Axis = 
VAR MaxSessions = MAXX ( ALL ( Sessions[user_id] ), CALCULATE ( COUNT ( Sessions[id] ) ) )
RETURN
    SELECTCOLUMNS ( GENERATESERIES ( 0, MaxSessions + 1 ), "Sessions", [Value] )

 

The measures have to be a bit more complicated.

AvgUniqueContent = 
VAR CurrSessions = SELECTEDVALUE ( 'X-Axis'[Sessions] )
VAR Summary =
    SUMMARIZE (
        Sessions,
        Sessions[user_id],
        "@UniqueContent", DISTINCTCOUNT ( Sessions[content_id] ),
        "@Sessions", COUNT ( Sessions[id] )
    )
RETURN
    AVERAGEX ( FILTER ( Summary, [@Sessions] = CurrSessions ), [@UniqueContent] )

 

The user count can be defined similarly (just use COUNTROWS instead of AVERAGEX in the last line).

 

View solution in original post

6 REPLIES 6
mahoneypat
Employee
Employee

This should be possible without a calculated table.  Please clarify what each bubble represents.  Is each one a User?  a month?  You gave the expect Y axis values.   What are the expected values for the X axis and bubble size?  I thought each bubble was a user until I saw that size was distinct count of users.

 

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


X-axis is the session count for a user. Size is the number of users with this session count. If you have three users and one has 3 sessions and the other two have 5 sessions each, then you get a bubble with size one at x=3 and a bubble with size two at x=5.

 

I'd be super interested if you can find a way to do this dynamically without setting up a table to use for the x-axis.

AlexisOlson
Super User
Super User

This is easier if you create a calculated summary table like this:

AlexisOlson_0-1634242826375.png

 

From there, it's drag-and-drop to put the fields into a scatterplot.

AlexisOlson_2-1634243085845.png

 

If it needs to be more dynamic (where a calculated table doesn't work), you'll still need to create an extra table to use for the x-axis.

Thanks a lot for your detailed answer, it's very much appreciated! This does work indeed, but unfortunately I need to be able to filter on month even and even on a category which is in another dimension linked by content_id. What would my table structure need to look like to make that work?

 

Thanks :)!

You'd need to create a table that contains all possible sessions per user.

 

Something like this:

X-Axis = 
VAR MaxSessions = MAXX ( ALL ( Sessions[user_id] ), CALCULATE ( COUNT ( Sessions[id] ) ) )
RETURN
    SELECTCOLUMNS ( GENERATESERIES ( 0, MaxSessions + 1 ), "Sessions", [Value] )

 

The measures have to be a bit more complicated.

AvgUniqueContent = 
VAR CurrSessions = SELECTEDVALUE ( 'X-Axis'[Sessions] )
VAR Summary =
    SUMMARIZE (
        Sessions,
        Sessions[user_id],
        "@UniqueContent", DISTINCTCOUNT ( Sessions[content_id] ),
        "@Sessions", COUNT ( Sessions[id] )
    )
RETURN
    AVERAGEX ( FILTER ( Summary, [@Sessions] = CurrSessions ), [@UniqueContent] )

 

The user count can be defined similarly (just use COUNTROWS instead of AVERAGEX in the last line).

 

That's awesome, thanks again for the detailed explanation. This works great!!

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.