Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
ms_steve
New Member

Calculating sum and average of a column based on distinct values in other columns

I have a table which has a unique row for every combination of Day, Person and Instance (see below). Each person has a score, which is fixed per person on a given day, but can vary from one day to the other. There are other category columns too, such as Age, Gender, linked to Dimension tables in a star schema. The nature of Instance here is arbitrary, apart from that it's what makes the rows unique (when combined with Day and Person).

 

Day

PersonScoreInstance IdOther columns ...
MondayFred21...
MondayFred22...
MondayBarney31...
MondayBarney33...
MondayBarney32...
MondayWilma22...
TuesdayFred31...
TuesdayBarney13...
TuesdayBarney12...

 

I want to be able to calculate the total and average Score per person, allowing for all the other columns including Day. I only want to disregard the Instance column and the duplication it introduces.

 

I initially tried creating a "deduped" table, which would appear like this in the example:

 

DayPersonScoreOther columns (not Instance)
MondayFred2...
MondayBarney3...
MondayWilma2...
TuesdayFred3...
TuesdayBarney1...

 

This would give me a total of 7 and an average of 2.33 for Monday, and a total of 4 and an average of 2 for Tuesday. It would also allow slicing this on the other category columns. This interim table was also linked to all the dimension tables.

 

This seemed to do what I wanted until I want to be able to filter on the Instance, e.g. do the same thing but only for Persons with an Instance Id of 1 (or indirectly based on some other property of an Instance), as my deduped/summarized table is not linked to the dimension table for Instance.

 

After a bit more research, I think that using a summarized table may not be the best/right approach and that something should/can be done in DAX, but I'm not really sure where to go next.

1 ACCEPTED SOLUTION

Ok, then try something like below for the average, and replace with SUMX for the sum:

Avg Score = AVERAGEX(SUMMARIZE('Table','Table'[Day],'Table'[Person])
    ,CALCULATE(MIN('Table'[Score])))

It should work fine with any filter or slicer.

 

View solution in original post

7 REPLIES 7
sjoerdvn
Super User
Super User

So If i understand correcty, if multiple instances are selected you want to deduplicate on instance id. How does that work? Take only the score of the last/highest instance id?

It shouldn't matter, as the Score is the same for each Person on a given Day, i.e the same Score for each Person over multiple Instances in the same Day.

So, where I have 2 and 3 selected, there are three unique Persons with those Instances on Monday, with Scores 3, 2 and 2. On Tuesday, there is only one Person with Instance 2 and/or 3 (Barney) who has a Score of 1.

But this also needs to work across all other categories (not shown). Basically, what I want to happen is to filter the table down in the normal way based on the various slicers, and then to be able to get the average Score per Day, but only counting each Person at most once per Day (depending on the slicers). So, after filtering on any/all columns (including Instance), I want to dedupe the table ignoring Instance (as per opening post) and then get the total/average Score.

Ok, then try something like below for the average, and replace with SUMX for the sum:

Avg Score = AVERAGEX(SUMMARIZE('Table','Table'[Day],'Table'[Person])
    ,CALCULATE(MIN('Table'[Score])))

It should work fine with any filter or slicer.

 

This works, thank you. And it seems fine with other filters and columns too. Happy to Accept this as the solution.

Would you mind explaining the need for the `MIN` in the `CALCULATE` function. Is this just because a column on its own doesn't constitute an expression?

You are right, it needs a scalar value but within the context the column has multiple values. I used MIN, but MAX or FIRSTNONBLANK would also work.

ms_steve
New Member

Hi @v-jialongy-msft 

 

Thank you for the reply. This seems to work when I have exactly one Instance selected, but not if I have no Instances selected, or multiple Instances selected. I'm sorry if I didn't explain it properly.

 

With no Instances selected, I would expect Monday to show the average of 3, 2, 2 = 2.33 and Tuesday to show the average of 1, 3 = 2. It shows nothing:

ms_steve_2-1712822054528.png

With Instances 2 and 3 selected, I would expect Monday to show the average of 3, 2, 2 = 2.33 (as all three Persons have either Instance 2 or 3) and Tuesday to show the average of 1 = 1 (only Barney has Instance 2 or 3 on Tuesday). It shows nothing:

ms_steve_3-1712822212778.png

Can this approach be extended to account for no selections and multiple selections? In my real data, there will be hundreds of variants of Instance, and they will be variable, so it's not something that can be hardcoded in a DAX query.

 

Also, will this approach account for filtering on the other columns, e.g. Age, Gender? Again, there are a lot of these, so not something that would be ideally hardcoded in the DAX.

 

 

 

 

v-jialongy-msft
Community Support
Community Support

Hi @ms_steve 

 

You want to calculate the average score of each person under different instances, right?

Please try the following dax

Measure = VAR _instance = SELECTEDVALUE('Table'[Instance])
RETURN
DIVIDE(CALCULATE(SUM('Table'[Score]),FILTER('Table',_instance='Table'[Instance])),DISTINCTCOUNT('Table'[Person]))

 

This is the result:

vjialongymsft_0-1712799788859.png

If I misunderstand what you mean, please provide the results you want in the form of screenshots or tables.

 

 

 

 

 

Best Regards,

Jayleny

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.