The Challenge
A fairly common data model design pattern can be described such that:
For example, a restaurant rating system might uses a ranking criteria of satisfactory from 0-100. These questions could be grouped into questions about "Service", "Atmosphere", "Quality", "Value" and "Cleanliness" where each group's score is simply an average of the individual question scores. These five groups are then grouped into "Restaurant" and "Food" where the score for each of these super groups is the minimum score of related sub-groups.
The Data Model
The data model to implement this design is relatively straight-forward.
[SuperGroups] 1-* [Groups2SuperGroups] 1-1 [Groups] 1-* [Attributes2Groups] 1-* [AttributeScores]
We will use the following columns and data to build a sample data model via "Enter Data" queries:
AttributeScores
AttributeScores contains two columns, "Attribute" and "Score".
Attribute | Score |
Attribute 1 | 100 |
Attribute 2 | 75 |
Attribute 3 | 80 |
Attribute 4 | 100 |
Attribute 5 | 60 |
Attribute 6 | 30 |
Attribute 7 | 40 |
Attribute 8 | 35 |
Attribute 9 | 75 |
Attribute 10 | 90 |
Groups
Groups contains a single column "Group"
Group |
Group1 |
Group2 |
Group3 |
Group4 |
SuperGroups
SuperGroups contains a single column "SuperGroup":
SuperGroup |
SuperGroup1 |
SuperGroup2 |
Attributes2Groups
Attributes2Groups contains 2 columns "Group" and "Attribute":
Group | Attribute |
Group1 | Attribute1 |
Group1 | Attribute2 |
Group1 | Attribute3 |
Group2 | Attribute4 |
Group2 | Attribute5 |
Group3 | Attribute6 |
Group3 | Attribute7 |
Group3 | Attribute8 |
Group4 | Attribute9 |
Group4 | Attribute10 |
Groups2SuperGroups
Groups2SuperGroups contains two columns, "Group" and "SuperGroup":
SuperGroup | Group |
SuperGroup1 | Group1 |
SuperGroup1 | Group2 |
SuperGroup2 | Group3 |
SuperGroup2 | Group4 |
A Simple Solution
Create an AverageScore column in Groups:
AverageScore = CALCULATE(AVERAGE(AttributeScores[Score]),RELATEDTABLE(AttributeScores))
Groups data is now:
Groups
Group | AverageScore |
Group1 | 85 |
Group2 | 80 |
Group3 | 35 |
Group4 | 82.5 |
Create a MinScore column in SuperGroups:
MinScore = CALCULATE(MIN(Groups[AverageScore]),RELATEDTABLE(Groups))
SuperGroups data is now:
SuperGroups
SuperGroup | MinScore |
SuperGroup1 | 80 |
SuperGroup2 | 35 |
Adding Complexity
What happens when we change our core AttributeScores table to include an additional field so that we can store multiple "things" in the same table:
Attribute | Score | Thing |
Attribute1 | 100 | Thing1 |
Attribute2 | 75 | Thing1 |
Attribute3 | 80 | Thing1 |
Attribute4 | 100 | Thing1 |
Attribute5 | 60 | Thing1 |
Attribute6 | 30 | Thing1 |
Attribute7 | 40 | Thing1 |
Attribute8 | 35 | Thing1 |
Attribute9 | 75 | Thing1 |
Attribute10 | 90 | Thing1 |
Attribute1 | 50 | Thing2 |
Attribute2 | 25 | Thing2 |
Attribute3 | 30 | Thing2 |
Attribute4 | 50 | Thing2 |
Attribute5 | 10 | Thing2 |
Attribute6 | 50 | Thing2 |
Attribute7 | 50 | Thing2 |
Attribute8 | 50 | Thing2 |
Attribute9 | 90 | Thing2 |
Attribute10 | 90 | Thing2 |
Note that the first 10 entries have remained the same other than the addition of "Thing1" in the "Thing" column. Now we have a representation of the same survey or scoring conducted on multiple "things".
Houston, We Have a Problem
By simply adding this additional field, we might think that we have not changed the data model to such a degree that our simple solution will suffice. No such luck.
If we simply put our Group and AverageScore in a table visual and add a slicer for "Thing", we get the same numbers for each Thing.
Group | AverageScore |
Group1 | 60 |
Group2 | 55 |
Group3 | 42.5 |
Group4 | 86.25 |
When we should get:
Thing1
Group | AverageScore |
Group1 | 85 |
Group2 | 80 |
Group3 | 35 |
Group4 | 82.5 |
Thing2
Group | AverageScore |
Group1 | 35 |
Group2 | 30 |
Group3 | 50 |
Group4 | 90 |
Similarly, if we add a simple table visualization of SuperGroups to show "SuperGroup" and "MinScore", regardless of slicer selection, these remain:
SuperGroup | MinScore |
SuperGroup1 | 55 |
SuperGroup2 | 42.5 |
If we try using row level security instead of slicers, the results are the same. So what is going on? Basically, calculated columns are essentially static in nature, calculated upon data refresh and are not updated contextually. Note that I realize that if you use the original "Score" value from AttributeScores and choose the dynamic "Average" aggregation that it works at the group level, but this only holds for very simplistic aggregations.
A Slightly More Complex Solution
In order to solve this problem, we switch from calculated columns to measures.
Back to the Groups table (or anywhere), we can create a measure defined such that:
AverageScoreMeasure = CALCULATE(AVERAGE(AttributeScores[Score]),RELATEDTABLE(AttributeScores))
This is EXACTLY the same DAX calculation as we had for our calculated column. However, the calculation for MinScore is dramatically different. For the MinScore as a measure, we are trying to find the minimum score of a measure and thus we must return the measure AverageScoreMeasure to our measure calculation within the correct context. Thus, to do this we define MinScoreMeasure such that:
MinScoreMeasure = MINX ( SUMMARIZE ( Groups, Groups[Group] , "AVG",[AverageScoreMeasure] ), [AVG])
What is going on here is that we are creating a summarized table using the SUMMARIZE function. We are returning two columns, the first is a grouping by our "Groups" column called "Groups" and the second is our AverageScoreMeasure calculation in a column called "AVG". We are then taking the MIN of column AVG.
Now, when we create table visualizations using our "Group" and "AverageScoreMeasure" in one table and our "SuperGroup" and "MinScoreMeasure" in anther table, we receive the correct results for "Thing1" and "Thing2" whether using slicers or RLS.
Thing1
Groups
Group | AverageScoreMeasure |
Group1 | 85 |
Group2 | 80 |
Group3 | 35 |
Group4 | 82.5 |
SuperGroups
SuperGroup | MinScoreMeasure |
SuperGroup1 | 80 |
SuperGroup2 | 35 |
Thing2
Groups
Group | AverageScoreMeasure |
Group1 | 35 |
Group2 | 30 |
Group3 | 50 |
Group4 | 90 |
SuperGroups
SuperGroup | MinScoreMeasure |
SuperGroup1 | 30 |
SuperGroup2 | 50 |
Conclusion
The designer pattern described here has many applications to just about any scenario in which scores or rankings need to be aggregated and then aggregated again. The solutions described within this article handle both simple scenarios as well as complex scenarios involving multiple base scoring sets of data. This scenario demonstrates the limitations of custom columns and the corresponding superiority of measures for handling aggregation roll-ups.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.