Solved: Re: Merging tables with large numbers of rows

naninamu · ‎01-08-2025

Hi everyone. In theory I know how to do this, but the volume of data is giving me issues so need some help.

Essentially, and this is simplified for the sake of this example, I have 2 tables.

One is a Fact type table giving a list of Groups, let's call them Group A, Group B etc. In reality this table is some 20m rows.
The other is a dimension type table that lists out the Items in each Group, let's call them Item a, Item b etc. Each Group has a mix of different Items, and in reality each Group has around 10 or so Items.
Each item has a Value assigned to it.

Based on the Groups listed out in Table A, I need to be able to provide a total score for each type of Item.

Initially working with a subset of Test data, I merged the 2 tables and then summed up the scores. This worked for my Test data, but for the actual real data I am running into issues - due to the fact that I have 20m Groups each expanding out into 10 Items = 200m rows - PBI is taking way to long to do this and times out (keeping in mind in real life the data is a bit more complex and the tables wider)

Does anyone have any suggestions of now I might handle this more efficiently? Can't help thinking there's something obvious I'm overlooking here.

Thanks!!

lbendlin · ‎01-09-2025

There's a typo in your expected result - g is appearing 10 times not 9 times. So the total is 911, not 902.

You may want to try TREATAS as an alternative, it will likely perform better with larger data sets

View solution in original post

naninamu · ‎01-10-2025

I probably didn't explain well enough I think 😛

So in your SUMMARIZECOLUMNS you used GroupIDs[Group] - this is the distinct list of Groups - ie A, B , C, D each listed once.

I don't have the equivilent of GroupIDs[Group] - I have to use my equvilent of Groups[Group] which has each Group appearing multiple times.

In terms of efficiency, I should do it your way - ie have one table with each unique value. If I put Groups[Group] into your formula, obviously it doesn't work as the counts and sums are all wrong.

So what's the best way to get this unique list from an efficiency point of view?

If I create it in PQ by duplicating the table and removing duplicates, it's not efficient as I have 211m rows.

Would writing it in DAX be better as a calculated table? Would this impact me using it in the formula?

Essentially - how do I create a table with just distinct values of Group so I can use it in the measure? - in the most efficient way 🙂

lbendlin · ‎01-11-2025

I see. A DAX calculated table (using VALUES for example) is more efficient but you may run into circular reference errors. Try it out.

naninamu · ‎01-10-2025

OK I now have a really good understanding of how this all works. Many thanks I wouldn't have got there without you.

I do have one issue - I would like to know if I can create a table of distinct values within the SUMMARIZECOLUMNS formula. Due to how my real data is, I don't really have a GROUP table equivilant with a distinct listing of groups. Due to the fact creating this in PQ would have a lot of overhead due to the volume of data, I was hoping I could do something like:

SUMMARIZECOLUMNS(DISTINCT('Group Item Values'[Group]),'Group Item Values'[Item],

Doesn't work though. If I just do the DISTINCT bit, I can see it indeed creates a table with distinct Group values, but I can't work out how to use this within the SUMMARIZECOLUMNS formula. I tried assinging it to a variable but that didn't work either. Thanks!!

lbendlin · ‎01-10-2025

SUMMARIZE and SUMMARIZECOLUMNS already do the grouping for you. There is no need for DISTINCT.

You can assign tables to variables but your measure eventually needs to return a scalar value.

Please provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.

naninamu · ‎01-10-2025

I appreciate you taking the time to write this up. I had actually been looking into Summarizecolumns and Addcolumns to further my understanding, but this aides me further by clearly spelling it out. Many thanks!

naninamu · ‎01-09-2025

I've been trying to understand how exactly "Item Count" and "Value Sum" work. Could you maybe in english just explain the logic behind them? I'm new to DAX so can't quite work it out at present! Thanks in advance.

lbendlin · ‎01-10-2025

First you need to consider the data model. I chose this

but that is my personal choice. Depending on your business scenario you might need a different data model, for example without linking the tables.

To better understand the DAX you can use DAXFormatter.com

Item count =
VAR a =
    ADDCOLUMNS (
        SUMMARIZECOLUMNS ( 'Group IDs'[Group], 'Group Item Values'[Item] ),
        "v", CALCULATE ( SUM ( 'Group Item Values'[Value] ) ),
        "ct", CALCULATE ( COUNTROWS ( Groups ) )
    )
RETURN
    SUMX ( FILTER ( a, NOT ISBLANK ( [v] ) ), [ct] )

First we materialize all combinations of Group and Item.

Then we compute (separately) the item values and the group count

The item values are required to get the right granularity. But due to the cross join in SUMMARIZECOLUMNS we need to then exclude the blanks. For the rest we add up the counts.

Value Sum works exactly the same, but we don't need to worry about the filter as here blanks won't contribute to the sum. You could add it if you wanted.

Value Sum =
VAR a =
    ADDCOLUMNS (
        SUMMARIZECOLUMNS ( 'Group IDs'[Group], 'Group Item Values'[Item] ),
        "v", CALCULATE ( SUM ( 'Group Item Values'[Value] ) ),
        "ct", CALCULATE ( COUNTROWS ( Groups ) )
    )
RETURN
    SUMX ( a, [v] * [ct] )

naninamu · ‎01-09-2025

Thanks @lbendlin - I really appreciate your help! I will now try to apply this to my real world data and hopefully it works 🙂 Looks easy when you see it!!

naninamu · ‎01-08-2025

Hi - OK so if this is my sample data

Table A - lists the Groups

Group

B

A

B

A

B

C

B

A

D

A

B

D

Table B - lists the Items that make up each Group and their value

Group	Item	Value
A	a	10
A	b	15
A	d	22
A	e	18
A	g	9
B	a	10
B	b	15
B	d	22
B	g	9
B	h	3
C	b	15
C	c	25
C	d	22
C	f	6
C	g	9
D	a	10
D	c	25
D	d	22
D	e	18
D	f	6

If I were to expand out the Groups into their component Items and summarise the result I'd get this:

Item	Count	Value Sum
a	12	120
b	10	150
c	4	100
d	13	286
e	7	126
f	4	24
g	9	81
h	5	15
Grand Total	64	902

I want to then put it in a scatter to look a bit like this:

'

I can't work out how to get it to sum up all the items for each Group on the X axis - it just sums up what's in Table B when I do it, doesn't take table A into account to account for multiple Groups.

ie - in Table B, which is like the Dimension table, Item a appears 3 times. PBI is giving me 3 lots of 10 equals 30 as the sum, rather than 12 lot of 10 equals 120 - Item a appears 12 times if all the Groups from Table A were to be expanded out.

Thanks in advance for any help,

lbendlin · ‎01-09-2025

There's a typo in your expected result - g is appearing 10 times not 9 times. So the total is 911, not 902.

You may want to try TREATAS as an alternative, it will likely perform better with larger data sets

naninamu · ‎01-09-2025

Just on your point actually - do you mean use TREATAS to merge the 2 data sources? I did try this, but try as I might I couldn't get the syntax to work so I abandoned that path! Could you help with what that code might look like? Cheers.

naninamu · ‎01-08-2025

Thanks - the issue is I need to drag the Sum of each Item onto the X axis of a scatter chart. ( ie the User selects the Item, and there is a scatter chart showing on X the sum of the scores and on Y the number of times that Item appears. Not quite sure how to do this... Item and Score are both on Table B so if I drag them onto the scatter, it's not going to know I'm trying to expand out the Groups in Table A. Is there some kind of measure I can write to drag onto the axes?

lbendlin · ‎01-08-2025

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.

Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

lbendlin · ‎01-08-2025

Don't merge them. Let the data model do the work for you. It will be substantially faster.

Merging tables with large numbers of rows

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Merging tables with large numbers of rows

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026