Re: Multi Step Merge

lasersharks · ‎12-13-2021

Hi all,

I have a tricky one that I'd appreciate some advice on. I'm trying to perform a multi step merge between two tables. I have come up with two solutions which are described below with sample data but the performance of each is terrible when I'm trying to join tables which have 900k rows (I can only get method 1 to work with production data. Method 2 only works with sample data and it just spins forever if I try giving it more than a few hundred rows). I'm joining between a CSV file and a SQL table.

Problem:

Table 1 is a lookup table and table 2 has the data I want to augment. The problem is table 2 may have a valid value for column A, B, C or all three and I want to join to the lookup table in that order of precedence. i.e. lookup table 1 and if you find a match with column A, then use it, else lookup on column B and if you find a match on column B then use it else lookup on column C and if you find a match then use it otherwise null.

Table 1:

A	B	C	Ref
A1	B1	C1	1000
A2	B2	C2	2000
A3	B3	C3	3000

Table 2:

A	B	C
a1	nonsense	nonsense
nonsense	B2	nonsense
nonsense	nonsense	c3
nonsense	nonsense	nonsense
c3	nonsense	nonsense
nonsense	c2	nonsense
nonsense	nonsense	C1
nonsense	b1	c3

Result:

A	B	C	ID
a1	nonsense	nonsense	1000
nonsense	B2	asdd	2000
nonsense	nonsense	c3	3000
nonsense	nonsense	nonsense	null
c3	nonsense	nonsense	null
nonsense	c2	nonsense	null
nonsense	nonsense	C1	1000
nonsense	b1	c3	1000

Note the last row has valid matches for B and C but B is pulled back according to precedence that I require.

Method 1:

Method 1 just does three table merges and then coalesces the 3 found columns to fulfill my precedence requirement. It works in production but it is very slow.

let
    Source = #"Table 2",
    #"Uppercased Text" = Table.TransformColumns(Source,{{"A", Text.Upper, type text}, {"B", Text.Upper, type text}, {"C", Text.Upper, type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Uppercased Text", {"A"}, #"Table 1", {"A"}, "A Matches", JoinKind.LeftOuter),
    #"Expanded A Matches" = Table.ExpandTableColumn(#"Merged Queries", "A Matches", {"Ref"}, {"A Matches.Ref"}),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded A Matches", {"B"}, #"Table 1", {"B"}, "B Matches", JoinKind.LeftOuter),
    #"Expanded B Matches" = Table.ExpandTableColumn(#"Merged Queries1", "B Matches", {"Ref"}, {"B Matches.Ref"}),
    #"Merged Queries2" = Table.NestedJoin(#"Expanded B Matches", {"C"}, #"Table 1", {"C"}, "C Matches", JoinKind.LeftOuter),
    #"Expanded C Matches" = Table.ExpandTableColumn(#"Merged Queries2", "C Matches", {"Ref"}, {"C Matches.Ref"}),
    #"Added Custom" = Table.AddColumn(#"Expanded C Matches", "Ref", each [A Matches.Ref] ?? [B Matches.Ref] ?? [C Matches.Ref]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"A Matches.Ref", "B Matches.Ref", "C Matches.Ref"})
in
    #"Removed Columns"

Method 2:

Method 2 uses a custom function which uses Table.SelectRows to return the value I need. I thought this would be faster but I can't even get it to work on more than a 300 rows. I have tried Table.Buffer() on both table1 and/or table2 and it still won't complete. I have also tried Table.AddKey() which didn't do much either.

let
    Source = #"Table 2",
    lookup = (a as text, b as text, c as text) as list => 
        let 
            
            w = Table.SelectRows(#"Table 1", each Text.Lower([A])=Text.Lower(a)),
            x = Table.SelectRows(#"Table 1", each Text.Lower([B])=Text.Lower(b)),
            y = Table.SelectRows(#"Table 1", each Text.Lower([C])=Text.Lower(c)),
            z = if Table.RowCount(w) > 0 then w[Ref] else if Table.RowCount(x) > 0 then x[Ref] else if Table.RowCount(y) > 0 then y[Ref] else {}

        in  
           z,

    #"Added Custom" = Table.AddColumn(Source, "ID", each lookup([A], [B], [C])),
    #"Expanded ID" = Table.ExpandListColumn(#"Added Custom", "ID")

in
    #"Expanded ID"

Does anyone have any thoughts on anything else I can try or something obvious I may have missed. FYI, my real use case has 6 merges but I think the principle should still apply.

Anonymous · ‎12-15-2021

Why not break the lookup table into three separate one column tables, then do the joins?

--Nate

lasersharks · ‎12-16-2021

I do that in my real world case using Table.SelectColumns() but didn't show it for the example.

lasersharks · ‎12-13-2021

Thanks @smpa01 I'll give it a go tomorrow and see if there is any improvement.

smpa01 · ‎12-13-2021

I don't know if this going to help you but give this a try.

@lasersharks

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

lasersharks · ‎12-13-2021

Thanks @AlexisOlson I have tried with normal if then else statements and it doesn't improve things much. Method 1 works and I even have a setup where I filter subsequent merges if a match has already happened but it isn't fast by any measure. I was wondering if there was some other technique out there I hadn't considered.

AlexisOlson · ‎12-13-2021

You could also try unpivoting both tables before merging.

I don't know if this will be any faster but you can try something like this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjRU0lFSyMvPK04FIgU4QBZVitWJVoJzdJScjHDKYtEN4iQbo5uhgJUNUgRUi08aSSIZtzvQjHA2RJdNMoQ6KxYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"A", type text}, {"B", type text}, {"C", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Column", "Value"),
    #"Uppercased Text" = Table.TransformColumns(#"Unpivoted Columns",{{"Value", Text.Upper, type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Uppercased Text", {"Column", "Value"}, Table1, {"Column", "Value"}, "Table", JoinKind.LeftOuter),
    #"Expanded Table" = Table.ExpandTableColumn(#"Merged Queries", "Table", {"Ref"}, {"Ref"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Table", each ([Ref] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Index"}, {{"Ref", each List.First([Ref]), type text}}),
    #"Merged Queries1" = Table.NestedJoin(#"Added Index", {"Index"}, #"Grouped Rows", {"Index"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries1", "Grouped Rows", {"Ref"}, {"Ref"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Grouped Rows",{"Index"})
in
    #"Removed Columns"

lasersharks · ‎12-13-2021

Thanks @AlexisOlson . I will give it a try tomorrow and see if the performance improves.

AlexisOlson · ‎12-13-2021

I'd expect your Method 1 to work OK, although I think your coalesce ("??") is probably breaking the query folding. I'd suggest editing it to use if/then rather than ?? to see if that makes any difference (since PQ knows how to turn if/then into a SQL case statement).

smpa01 · ‎12-13-2021

@lasersharks which one is the SQL table ? Table 1/Table2?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

My custom visualization projects

Plotting Live Sound: Viz1

Beautiful News:Viz1, Viz2, Viz3

Visual Capitalist: Working Hrs

Others:Easing Graph, Animated Calendar

lasersharks · ‎12-13-2021

@smpa01 it is Table 1. What are you thinking?

Multi Step Merge

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

Multi Step Merge

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026