Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
Anonymous
Not applicable

Merge Query against a range

Hi All,

 

I am trying to find an efficient design pattern to do a merge join in Power Query where the join is not a striaght forward equality join.

 

I have a table containing sales data and for each sale record want to merge in the product cost from another file. The Sales data comes from a csv file and the product data comes from an Excel spreadsheet. The product costs can change over time so the join needs to join not only on product id, but also compare the sales date against the start and end dates in the product table. 

 

An example of the two tables and the required output are below:

 

MergeQuery.PNG

 

I saw a great video frrm Pragmatic works https://www.youtube.com/watch?v=ckB93I3GpcQ&t=1s&ab_channel=PragmaticWorks which shows one solution of doing this where by he joins over on the product id and then creates a custom column on this table checking the range. This works fine but the performance is terrible. These files are not particularly big but power query is loading gigs of data. I tried using Table.Buffer which decreased the size of the loading but it takes forever to do the processing. 

 

MergeExample.png

 

Another good post by RADACAD https://radacad.com/dates-between-merge-join-in-power-query whereby you generate a list of dates for each range and then do a normal equality join. This works but again is not a solution that is going to work with any number of lookup records / large date ranges. 

 

This seems like a pattern that must come up again and again in the wild so am hoping someone can provide some advice on how they implemented a decent solution. 

 

Any help is much appreicated.

 

Thanks,

 

Alex

 

 

6 REPLIES 6
Anonymous
Not applicable

here a simulation with dataset sales and lookup of 400k rows lasts few minuts.

Anonymous
Not applicable

One way to speed up execution is to leverage the fact that the dates are sorted in the lookup list and in the sale date lists of the various products.
In this way, the lists are "traversed" only once.
This in M could be done with the List.Generate function.
What size are your tables.
How many products, how many sales and how many changes in cost?

Anonymous
Not applicable

have a look at this ...

 

for the "dirty work" I used list.accumulate which is perhaps more efficient than list.generate.
One can still tighten something, if you find it a proposal "close" to your needs.

Vijay_A_Verma
Super User
Super User

I have made 2 queries - One based on list approach and one based on Table.SelectRows approach. 

This is a mini-mockup which you can fit in your code.

These will definitely improve performance. Let me know whether the improvement is acceptable or not/

The Excel mockup is uploaded to https://1drv.ms/x/s!Akd5y6ruJhvhuT0ZjSMjlq22TTQZ?e=AkZ7gr 

 

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product SKU", Int64.Type}, {"SalesDate", type date}}),
    StartDateList = List.Buffer(ProductCostLookup[Start Date]),
    EndDateList = List.Buffer(ProductCostLookup[End Date]),
    CostList = List.Buffer(ProductCostLookup[Cost]),
    SKUList = List.Buffer(ProductCostLookup[Product SKU]),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Cost", each CostList{List.PositionOf(List.Transform(List.Positions(StartDateList),(i)=>[SalesDate]>=StartDateList{i} and [SalesDate]<=EndDateList{i} and [Product SKU]=SKUList{i}),true)})
in
    #"Added Custom"

 

Table.SelectRows approach

 

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product SKU", Int64.Type}, {"SalesDate", type date}}),
    BufferedTable = Table.Buffer(ProductCostLookup),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Cost", each Table.SelectRows(BufferedTable, (x)=> x[Start Date]<=[SalesDate] and x[End Date]>=[SalesDate] and x[Product SKU]=[Product SKU]){0}[Cost])
in
    #"Added Custom"

 

 

Anonymous
Not applicable

Thanks a lot for the code. I have implemented it but it is still very slow. 

abatahir
Helper I
Helper I

Thanks for this idea, but still it's taking too much time. 

I am running almost 10 queries, 7 are based on one in which all these functions ran. 
what if I do DAX work where possible can increase the speed?

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors