Re: Merge Query against a range

Anonymous · ‎03-30-2022

Hi All,

I am trying to find an efficient design pattern to do a merge join in Power Query where the join is not a striaght forward equality join.

I have a table containing sales data and for each sale record want to merge in the product cost from another file. The Sales data comes from a csv file and the product data comes from an Excel spreadsheet. The product costs can change over time so the join needs to join not only on product id, but also compare the sales date against the start and end dates in the product table.

An example of the two tables and the required output are below:

I saw a great video frrm Pragmatic works https://www.youtube.com/watch?v=ckB93I3GpcQ&t=1s&ab_channel=PragmaticWorks which shows one solution of doing this where by he joins over on the product id and then creates a custom column on this table checking the range. This works fine but the performance is terrible. These files are not particularly big but power query is loading gigs of data. I tried using Table.Buffer which decreased the size of the loading but it takes forever to do the processing.

Another good post by RADACAD https://radacad.com/dates-between-merge-join-in-power-query whereby you generate a list of dates for each range and then do a normal equality join. This works but again is not a solution that is going to work with any number of lookup records / large date ranges.

This seems like a pattern that must come up again and again in the wild so am hoping someone can provide some advice on how they implemented a decent solution.

Any help is much appreicated.

Thanks,

Alex

Anonymous · ‎04-01-2022

here a simulation with dataset sales and lookup of 400k rows lasts few minuts.

Anonymous · ‎03-31-2022

One way to speed up execution is to leverage the fact that the dates are sorted in the lookup list and in the sale date lists of the various products.
In this way, the lists are "traversed" only once.
This in M could be done with the List.Generate function.
What size are your tables.
How many products, how many sales and how many changes in cost?

Anonymous · ‎03-31-2022

have a look at this ...

for the "dirty work" I used list.accumulate which is perhaps more efficient than list.generate.
One can still tighten something, if you find it a proposal "close" to your needs.

Vijay_A_Verma · ‎03-30-2022

I have made 2 queries - One based on list approach and one based on Table.SelectRows approach.

This is a mini-mockup which you can fit in your code.

These will definitely improve performance. Let me know whether the improvement is acceptable or not/

The Excel mockup is uploaded to https://1drv.ms/x/s!Akd5y6ruJhvhuT0ZjSMjlq22TTQZ?e=AkZ7gr

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product SKU", Int64.Type}, {"SalesDate", type date}}),
    StartDateList = List.Buffer(ProductCostLookup[Start Date]),
    EndDateList = List.Buffer(ProductCostLookup[End Date]),
    CostList = List.Buffer(ProductCostLookup[Cost]),
    SKUList = List.Buffer(ProductCostLookup[Product SKU]),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Cost", each CostList{List.PositionOf(List.Transform(List.Positions(StartDateList),(i)=>[SalesDate]>=StartDateList{i} and [SalesDate]<=EndDateList{i} and [Product SKU]=SKUList{i}),true)})
in
    #"Added Custom"

Table.SelectRows approach

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product SKU", Int64.Type}, {"SalesDate", type date}}),
    BufferedTable = Table.Buffer(ProductCostLookup),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Cost", each Table.SelectRows(BufferedTable, (x)=> x[Start Date]<=[SalesDate] and x[End Date]>=[SalesDate] and x[Product SKU]=[Product SKU]){0}[Cost])
in
    #"Added Custom"

Anonymous · ‎04-03-2022

Thanks a lot for the code. I have implemented it but it is still very slow.

abatahir · ‎03-30-2022

Thanks for this idea, but still it's taking too much time.

I am running almost 10 queries, 7 are based on one in which all these functions ran.
what if I do DAX work where possible can increase the speed?

Merge Query against a range

Helpful resources

FabCon Global Hackathon

Power BI Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Merge Query against a range

Helpful resources

FabCon Global Hackathon

Power BI Monthly Update - September 2025

FabCon Atlanta 2026