Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

Reply
VivG
Frequent Visitor

Dates between merge join / Expand columns performance issue

Hi all,

 

I have a first very simple table A which contains only three columns:

- a unique ID

- a client ID

- a date

This table has 30,000 rows.

 

I have a second table B with these important columns:

- a client ID

- a start date

- an end date

- a category

This table is bigger with 300,000 rows.

 

What I want to do is to have a new table A which includes the category from table B by merging on the client ID between table A and B and where the date of table A is between the start date and the end date of table B.

 

What I did is the following:

- merge the two tables based on the client ID

#"Merge" = Table.NestedJoin(#"Table A", {"client ID"}, #"Table B", {"client ID"}, "new_column", JoinKind.LeftOuter)

- add a custom column to do the row selection based on the date range

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

- then expand my new custom column

#"Result" = Table.ExpandTableColumn(#"Date range selection", "custom", {"start date", "end date", "category"}, {"start date", "end date", "category"})

 

The issue I have is that this query is really too long. I have the feeling that for each row of table A, Power Query is re-loading table B (which is quite large). Even after 2 hours the query is still loading data.

 

I'm quite new to Power Query / Power BI so maybe my query is not good and there is a better way to do this. Do you have any idea of what can be done to improve this?

 

Thanks for your help!

Vivien

 

 

 

 

 

2 ACCEPTED SOLUTIONS
edhans
Super User
Super User

This is your problem:

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

 It is going to get deep into the weeds, but @ImkeF article here I think will help you.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

VivG
Frequent Visitor

Yes I read the article you linked and I tried to apply this to my situation but I was quite confused on which grouping dimension I should use on my different tables, I really have to improve my understanding of the M language...

Somebody suggested me to just invert step 2 and step 3 (so to do the expand column before doing the row selection) and it immediately solved my problem as table B is now loading only one time.

Thank you for your help @edhans and for the link!

View solution in original post

5 REPLIES 5
edhans
Super User
Super User

This is your problem:

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

 It is going to get deep into the weeds, but @ImkeF article here I think will help you.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
VivG
Frequent Visitor

Hello @edhans ,

 

And thank you for your answer!

 

Indeed, I don't think that this second step of my query is the issue. 

 

If I just do step 1:

- merge the two tables based on the client ID

#"Merge" = Table.NestedJoin(#"Table A", {"client ID"}, #"Table B", {"client ID"}, "new_column", JoinKind.LeftOuter)

 

and step 2:

- add a custom column to do the row selection based on the date range

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

 

my query gives a result in less than 2 minutes.

 

This is really when I add step 3:

- then expand my new custom column

#"Result" = Table.ExpandTableColumn(#"Date range selection", "custom", {"start date", "end date", "category"}, {"start date", "end date", "category"})

 

that the query never ends, even after two hours of waiting. 

 

I did a new test this morning and reduced table A to 5 rows instead of 30,000 rows and it gave me a result in 5 minutes. But that's what I feared, Power Query is reloading table B for each line of table A on step 3. Table B is 38 Mo and I saw Power Query loading 38 x 5 = 190 Mo of data. So if I want to do this with my complete table A of 30,000 rows, I would need 30,000 x 38 = 1,140 Go loaded and 500 hours! Is there any way to say to Power Query to keep table B in the RAM when it is loaded instead of re-loading it for each line of table A?

 

Thanks!

Vivien 

 

 

It is the expansion of that table you are crossjoining. It is the step I referenced that is the source of the issue. Did you read the article I linked to?



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
VivG
Frequent Visitor

Yes I read the article you linked and I tried to apply this to my situation but I was quite confused on which grouping dimension I should use on my different tables, I really have to improve my understanding of the M language...

Somebody suggested me to just invert step 2 and step 3 (so to do the expand column before doing the row selection) and it immediately solved my problem as table B is now loading only one time.

Thank you for your help @edhans and for the link!

Yes @VivG . The details Imke gets into in that article are pretty deep, but can be very helpful in cases like yours.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors