Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
naninamu
Helper III
Helper III

Merging tables with large numbers of rows

Hi everyone. In theory I know how to do this, but the volume of data is giving me issues so need some help.

 

Essentially, and this is simplified for the sake of this example, I have 2 tables. 

 

  • One is a Fact type table giving a list of Groups, let's call them Group A, Group B etc. In reality this table is some 20m rows.
  • The other is a dimension type table that lists out the Items in each Group, let's call them Item a, Item b etc. Each Group has a mix of different Items, and in reality each Group has around 10 or so Items.
  • Each item has a Value assigned to it.

naninamu_0-1736372817388.png

 

Based on the Groups listed out in Table A, I need to be able to provide a total score for each type of Item.

 

Initially working with a subset of Test data, I merged the 2 tables and then summed up the scores. This worked for my Test data, but for the actual real data I am running into issues - due to the fact that I have 20m Groups each expanding out into 10 Items = 200m rows - PBI is taking way to long to do this and times out (keeping in mind in real life the data is a bit more complex and the tables wider)

 

Does anyone have any suggestions of now I might handle this more efficiently? Can't help thinking there's something obvious I'm overlooking here.

 

Thanks!!

1 ACCEPTED SOLUTION

There's a typo in your expected result - g is appearing 10 times not 9 times.  So the total is 911, not 902.

 

lbendlin_0-1736451646518.png

You may want to try TREATAS as an alternative, it will likely perform better with larger data sets

 

View solution in original post

14 REPLIES 14
naninamu
Helper III
Helper III

I probably didn't explain well enough I think 😛

 

So in your SUMMARIZECOLUMNS you used GroupIDs[Group] - this is the distinct list of Groups - ie A, B , C, D each listed once. 

 

I don't have the equivilent of GroupIDs[Group] - I have to use my equvilent of Groups[Group] which has each Group appearing multiple times.

 

In terms of efficiency, I should do it your way - ie have one table with each unique value. If I put Groups[Group] into your formula, obviously it doesn't work as the counts and sums are all wrong.

 

So what's the best way to get this unique list from an efficiency point of view?

 

If I create it in PQ by duplicating the table and removing duplicates, it's not efficient as I have 211m rows.

Would writing it in DAX be better as a calculated table? Would this impact me using it in the formula?

 

Essentially  - how do I create a table with just distinct values of Group so I can use it in the measure?  - in the most efficient way 🙂

I see. A DAX calculated table (using VALUES for example) is more efficient but you may run into circular reference errors. Try it out. 

naninamu
Helper III
Helper III

OK I now have a really good understanding of how this all works. Many thanks I wouldn't have got there without you.

I do have one issue - I would like to know if I can create a table of distinct values within the SUMMARIZECOLUMNS formula. Due to how my real data is, I don't really have a GROUP table equivilant with a distinct listing of groups. Due to the fact creating this in PQ would have a lot of overhead due to the volume of data, I was hoping I could do something like:

 

SUMMARIZECOLUMNS(DISTINCT('Group Item Values'[Group]),'Group Item Values'[Item],

 

Doesn't work though. If I just do the DISTINCT bit, I can see it indeed creates a table with distinct Group values, but I can't work out how to use this within the SUMMARIZECOLUMNS formula. I tried assinging it to a variable but that didn't work either.  Thanks!!

SUMMARIZE and SUMMARIZECOLUMNS already do the grouping for you. There is no need for DISTINCT.

 

You can assign tables to variables but your measure eventually needs to return a scalar value.

 

Please provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.

naninamu
Helper III
Helper III

I appreciate you taking the time to write this up. I had actually been looking into Summarizecolumns and Addcolumns to further my understanding, but this aides me further by clearly spelling it out. Many thanks!

naninamu
Helper III
Helper III

I've been trying to understand how exactly "Item Count" and "Value Sum" work. Could you maybe in english just explain the logic behind them? I'm new to DAX so can't quite work it out at present! Thanks in advance.

First you need to consider the data model. I chose this

lbendlin_0-1736519612306.png

but that is my personal choice. Depending on your business scenario you might need a different data model, for example without linking the tables.

 

To better understand the DAX you can use DAXFormatter.com

 

Item count =
VAR a =
    ADDCOLUMNS (
        SUMMARIZECOLUMNS ( 'Group IDs'[Group], 'Group Item Values'[Item] ),
        "v", CALCULATE ( SUM ( 'Group Item Values'[Value] ) ),
        "ct", CALCULATE ( COUNTROWS ( Groups ) )
    )
RETURN
    SUMX ( FILTER ( a, NOT ISBLANK ( [v] ) ), [ct] )

First we materialize all combinations of Group and Item.

Then we compute (separately) the item values and the group count

The item values are required to get the right granularity. But due to the cross join in SUMMARIZECOLUMNS we need to then exclude the blanks.  For the rest we add up the counts.

 

Value Sum works exactly the same, but we don't need to worry about the filter as here blanks won't contribute to the sum. You could add it if you wanted.

 

Value Sum =
VAR a =
    ADDCOLUMNS (
        SUMMARIZECOLUMNS ( 'Group IDs'[Group], 'Group Item Values'[Item] ),
        "v", CALCULATE ( SUM ( 'Group Item Values'[Value] ) ),
        "ct", CALCULATE ( COUNTROWS ( Groups ) )
    )
RETURN
    SUMX ( a, [v] * [ct] )

 

 

 

naninamu
Helper III
Helper III

Thanks @lbendlin  - I really appreciate your help! I will now try to apply this to my real world data and hopefully it works 🙂 Looks easy when you see it!!

naninamu
Helper III
Helper III

Hi - OK so if this is my sample data

 

Table A - lists the Groups

Group
B
A
B
A
B
C
B
A
D
A
B
D
D

 

Table B - lists the Items that make up each Group and their value

GroupItemValue
Aa10
Ab15
Ad22
Ae18
Ag9
Ba10
Bb15
Bd22
Bg9
Bh3
Cb15
Cc25
Cd22
Cf6
Cg9
Da10
Dc25
Dd22
De18
Df6

 

If I were to expand out the Groups into their component Items and summarise the result I'd get this:

 

ItemCountValue Sum
a12120
b10150
c4100
d13286
e7126
f424
g981
h515
Grand Total64902

 

I want to then put it in a scatter to look a bit like this:

'

naninamu_0-1736396403725.png

 

 

I can't work out how to get it to sum up all the items for each Group on the X axis - it just sums up what's in Table B when I do it, doesn't take table A into account to account for multiple Groups.

ie - in Table B, which is like the Dimension table, Item a appears 3 times. PBI is giving me 3 lots of 10 equals 30 as the sum, rather than 12 lot of 10 equals 120 - Item a appears 12 times if all the Groups from Table A were to be expanded out.

 

Thanks in advance for any help,

There's a typo in your expected result - g is appearing 10 times not 9 times.  So the total is 911, not 902.

 

lbendlin_0-1736451646518.png

You may want to try TREATAS as an alternative, it will likely perform better with larger data sets

 

Just on your point actually - do you mean use TREATAS to merge the 2 data sources? I did try this, but try as I might I couldn't get the syntax to work so I abandoned that path! Could you help with what that code might look like? Cheers.

naninamu
Helper III
Helper III

Thanks - the issue is I need to drag the Sum of each Item onto the X axis of a scatter chart. ( ie the User selects the Item, and there is a scatter chart showing on X the sum of the scores and on Y the  number of times that Item appears. Not quite sure how to do this... Item and Score are both on Table B so if I drag them onto the scatter, it's not going to know I'm trying to expand out the Groups in Table A. Is there some kind of measure I can write to drag onto the axes?

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.

Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

lbendlin
Super User
Super User

Don't merge them. Let the data model do the work for you. It will be substantially faster.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.