Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hi everyone. In theory I know how to do this, but the volume of data is giving me issues so need some help.
Essentially, and this is simplified for the sake of this example, I have 2 tables.
Based on the Groups listed out in Table A, I need to be able to provide a total score for each type of Item.
Initially working with a subset of Test data, I merged the 2 tables and then summed up the scores. This worked for my Test data, but for the actual real data I am running into issues - due to the fact that I have 20m Groups each expanding out into 10 Items = 200m rows - PBI is taking way to long to do this and times out (keeping in mind in real life the data is a bit more complex and the tables wider)
Does anyone have any suggestions of now I might handle this more efficiently? Can't help thinking there's something obvious I'm overlooking here.
Thanks!!
Solved! Go to Solution.
There's a typo in your expected result - g is appearing 10 times not 9 times. So the total is 911, not 902.
You may want to try TREATAS as an alternative, it will likely perform better with larger data sets
I probably didn't explain well enough I think 😛
So in your SUMMARIZECOLUMNS you used GroupIDs[Group] - this is the distinct list of Groups - ie A, B , C, D each listed once.
I don't have the equivilent of GroupIDs[Group] - I have to use my equvilent of Groups[Group] which has each Group appearing multiple times.
In terms of efficiency, I should do it your way - ie have one table with each unique value. If I put Groups[Group] into your formula, obviously it doesn't work as the counts and sums are all wrong.
So what's the best way to get this unique list from an efficiency point of view?
If I create it in PQ by duplicating the table and removing duplicates, it's not efficient as I have 211m rows.
Would writing it in DAX be better as a calculated table? Would this impact me using it in the formula?
Essentially - how do I create a table with just distinct values of Group so I can use it in the measure? - in the most efficient way 🙂
I see. A DAX calculated table (using VALUES for example) is more efficient but you may run into circular reference errors. Try it out.
OK I now have a really good understanding of how this all works. Many thanks I wouldn't have got there without you.
I do have one issue - I would like to know if I can create a table of distinct values within the SUMMARIZECOLUMNS formula. Due to how my real data is, I don't really have a GROUP table equivilant with a distinct listing of groups. Due to the fact creating this in PQ would have a lot of overhead due to the volume of data, I was hoping I could do something like:
SUMMARIZECOLUMNS(DISTINCT('Group Item Values'[Group]),'Group Item Values'[Item],
Doesn't work though. If I just do the DISTINCT bit, I can see it indeed creates a table with distinct Group values, but I can't work out how to use this within the SUMMARIZECOLUMNS formula. I tried assinging it to a variable but that didn't work either. Thanks!!
SUMMARIZE and SUMMARIZECOLUMNS already do the grouping for you. There is no need for DISTINCT.
You can assign tables to variables but your measure eventually needs to return a scalar value.
Please provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.
I appreciate you taking the time to write this up. I had actually been looking into Summarizecolumns and Addcolumns to further my understanding, but this aides me further by clearly spelling it out. Many thanks!
I've been trying to understand how exactly "Item Count" and "Value Sum" work. Could you maybe in english just explain the logic behind them? I'm new to DAX so can't quite work it out at present! Thanks in advance.
First you need to consider the data model. I chose this
but that is my personal choice. Depending on your business scenario you might need a different data model, for example without linking the tables.
To better understand the DAX you can use DAXFormatter.com
Item count =
VAR a =
ADDCOLUMNS (
SUMMARIZECOLUMNS ( 'Group IDs'[Group], 'Group Item Values'[Item] ),
"v", CALCULATE ( SUM ( 'Group Item Values'[Value] ) ),
"ct", CALCULATE ( COUNTROWS ( Groups ) )
)
RETURN
SUMX ( FILTER ( a, NOT ISBLANK ( [v] ) ), [ct] )
First we materialize all combinations of Group and Item.
Then we compute (separately) the item values and the group count
The item values are required to get the right granularity. But due to the cross join in SUMMARIZECOLUMNS we need to then exclude the blanks. For the rest we add up the counts.
Value Sum works exactly the same, but we don't need to worry about the filter as here blanks won't contribute to the sum. You could add it if you wanted.
Value Sum =
VAR a =
ADDCOLUMNS (
SUMMARIZECOLUMNS ( 'Group IDs'[Group], 'Group Item Values'[Item] ),
"v", CALCULATE ( SUM ( 'Group Item Values'[Value] ) ),
"ct", CALCULATE ( COUNTROWS ( Groups ) )
)
RETURN
SUMX ( a, [v] * [ct] )
Thanks @lbendlin - I really appreciate your help! I will now try to apply this to my real world data and hopefully it works 🙂 Looks easy when you see it!!
Hi - OK so if this is my sample data
Table A - lists the Groups
Group |
B |
A |
B |
A |
B |
C |
B |
A |
D |
A |
B |
D |
D |
Table B - lists the Items that make up each Group and their value
Group | Item | Value |
A | a | 10 |
A | b | 15 |
A | d | 22 |
A | e | 18 |
A | g | 9 |
B | a | 10 |
B | b | 15 |
B | d | 22 |
B | g | 9 |
B | h | 3 |
C | b | 15 |
C | c | 25 |
C | d | 22 |
C | f | 6 |
C | g | 9 |
D | a | 10 |
D | c | 25 |
D | d | 22 |
D | e | 18 |
D | f | 6 |
If I were to expand out the Groups into their component Items and summarise the result I'd get this:
Item | Count | Value Sum |
a | 12 | 120 |
b | 10 | 150 |
c | 4 | 100 |
d | 13 | 286 |
e | 7 | 126 |
f | 4 | 24 |
g | 9 | 81 |
h | 5 | 15 |
Grand Total | 64 | 902 |
I want to then put it in a scatter to look a bit like this:
'
I can't work out how to get it to sum up all the items for each Group on the X axis - it just sums up what's in Table B when I do it, doesn't take table A into account to account for multiple Groups.
ie - in Table B, which is like the Dimension table, Item a appears 3 times. PBI is giving me 3 lots of 10 equals 30 as the sum, rather than 12 lot of 10 equals 120 - Item a appears 12 times if all the Groups from Table A were to be expanded out.
Thanks in advance for any help,
Just on your point actually - do you mean use TREATAS to merge the 2 data sources? I did try this, but try as I might I couldn't get the syntax to work so I abandoned that path! Could you help with what that code might look like? Cheers.
Thanks - the issue is I need to drag the Sum of each Item onto the X axis of a scatter chart. ( ie the User selects the Item, and there is a scatter chart showing on X the sum of the scores and on Y the number of times that Item appears. Not quite sure how to do this... Item and Score are both on Table B so if I drag them onto the scatter, it's not going to know I'm trying to expand out the Groups in Table A. Is there some kind of measure I can write to drag onto the axes?
Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.
Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...
Don't merge them. Let the data model do the work for you. It will be substantially faster.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
14 | |
13 | |
12 | |
12 | |
12 |