Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
JamesRobson
Resolver II
Resolver II

Advanced(?) Merge Based on Latest Date after another Field Date

I need to join some data (in a method I don't entirely agree with but I don't see any other option based on my dataset) and although I have a solution it takes an age to load due to 'many to many' relationships and large datasets so I want to find a better/more efficient method and it seems to be the merge that causes the issue so that seems a logical place to start with the optimisation....

 

Scenario:

Invoice Table and Job Table, standard practice is to join the 2 via Job number so we can compare the Job Date and Invoice Date. However if there is no Job Number then we want to join by Item/Product then find the last Job Date <= Invoice Date and use that for comparison.

 

Invoice Table Sample

JamesRobson_0-1674669247757.png

Job Table Sample

JamesRobson_1-1674669280210.png

 

Desired Result

JamesRobson_2-1674669495571.png

Achieved using the basic/slow/memory heavy method of:

let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Invoice Number", Int64.Type}, {"InvoiceDate", type date}, {"Item", type text}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Item"}, Job, {"Prod"}, "Job", JoinKind.LeftOuter),
#"Expanded Job" = Table.ExpandTableColumn(#"Merged Queries", "Job", {"JobDate"}, {"JobDate"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Job", each [JobDate] <= [InvoiceDate]),
#"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"JobDate", Order.Descending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"Invoice Number"})
in
#"Removed Duplicates"

 

So I found a way to merge only the Job Date <= Invoice Date using the following...

= Table.AddColumn(#"Changed Type", "Custom", (I)=> Table.SelectRows(Job,(J)=> J[Prod] = I[Item] and J[JobDate] <= I[InvoiceDate]))

 

and once the above is expanded I can return only a single line per Invoice with the latest Job Date

= Table.SelectRows(#"Changed Type1", each [JobDate] = List.Max(let latest = [Item] in Table.SelectRows(#"Changed Type1", each [JobDate] <= [InvoiceDate] and [Item] = latest)[JobDate]))

 

Full Advanced Code

let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Invoice Number", Int64.Type}, {"InvoiceDate", type date}, {"Item", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", (I)=> Table.SelectRows(Job,(J)=> J[Prod] = I[Item] and J[JobDate] <= I[InvoiceDate])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"JobDate"}, {"JobDate"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded Custom",{{"JobDate", type date}}),
#"Added Custom1" = Table.SelectRows(#"Changed Type1", each [JobDate] = List.Max(let latest = [Item] in Table.SelectRows(#"Changed Type1", each [JobDate] <= [InvoiceDate] and [Item] = latest)[JobDate]))
in
#"Added Custom1"

 

However what I can't do is get both these steps into one and stop the additional lines being created during the merge (ok yes its reduced slightly by only pulling those line with an Job Date <= Invoice Date but still we are talking millions of additional matches).

 

Any advice on this would be appreciated as I'm all out of answers and can't find the same use case else where.

 

Thanks,

1 ACCEPTED SOLUTION
wdx223_Daniel
Super User
Super User

let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],

JobGroup=Table.Buffer(Table.Group(Job,"Prod",{"n",each List.Sort([JobDate],1)})),

Custom1=Table.AddColumn(Source,"JobDate",each List.Skip(JobGroup{[Prod=[Item]]}?[n]? ??{},(x)=>x>[InvoiceDate]){0}?)

in Custom1

View solution in original post

2 REPLIES 2
wdx223_Daniel
Super User
Super User

let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],

JobGroup=Table.Buffer(Table.Group(Job,"Prod",{"n",each List.Sort([JobDate],1)})),

Custom1=Table.AddColumn(Source,"JobDate",each List.Skip(JobGroup{[Prod=[Item]]}?[n]? ??{},(x)=>x>[InvoiceDate]){0}?)

in Custom1

Absolute hero! Thank you very much now I just need to work out what its doing 😁

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.