Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.

Reply
lasersharks
Kudo Collector
Kudo Collector

Multi Step Merge

Hi all,

 

I have a tricky one that I'd appreciate some advice on. I'm trying to perform a multi step merge between two tables. I have come up with two solutions which are described below with sample data but the performance of each is terrible when I'm trying to join tables which have 900k rows (I can only get method 1 to work with production data. Method 2 only works with sample data and it just spins forever if I try giving it more than a few hundred rows). I'm joining between a CSV file and a SQL table.

 

Problem:

Table 1 is a lookup table and table 2 has the data I want to augment. The problem is table 2 may have a valid value for column A, B, C or all three and I want to join to the lookup table in that order of precedence. i.e. lookup table 1 and if you find a match with column A, then use it, else lookup on column B and if you find a match on column B then use it else lookup on column C and if you find a match then use it otherwise null.

 

Table 1:

ABCRef
A1B1C11000
A2B2C22000
A3B3C33000

 

Table 2:

ABC
a1 nonsense           nonsense
nonsenseB2 nonsense
nonsense        nonsense c3
nonsense nonsense nonsense
c3 nonsense nonsense
nonsensec2 nonsense
nonsense nonsenseC1
nonsenseb1c3

 

 

Result:

 

ABCID
a1nonsense       nonsense1000
nonsense        B2asdd2000
nonsensenonsensec33000
nonsensenonsensenonsense        null
c3nonsensenonsensenull
nonsensec2nonsensenull
nonsensenonsenseC11000
nonsenseb1c31000

 

Note the last row has valid matches for B and C but B is pulled back according to precedence that I require.

 

Method 1:

Method 1 just does three table merges and then coalesces the 3 found columns to fulfill my precedence requirement. It works in production but it is very slow.

 

 

 

let
    Source = #"Table 2",
    #"Uppercased Text" = Table.TransformColumns(Source,{{"A", Text.Upper, type text}, {"B", Text.Upper, type text}, {"C", Text.Upper, type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Uppercased Text", {"A"}, #"Table 1", {"A"}, "A Matches", JoinKind.LeftOuter),
    #"Expanded A Matches" = Table.ExpandTableColumn(#"Merged Queries", "A Matches", {"Ref"}, {"A Matches.Ref"}),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded A Matches", {"B"}, #"Table 1", {"B"}, "B Matches", JoinKind.LeftOuter),
    #"Expanded B Matches" = Table.ExpandTableColumn(#"Merged Queries1", "B Matches", {"Ref"}, {"B Matches.Ref"}),
    #"Merged Queries2" = Table.NestedJoin(#"Expanded B Matches", {"C"}, #"Table 1", {"C"}, "C Matches", JoinKind.LeftOuter),
    #"Expanded C Matches" = Table.ExpandTableColumn(#"Merged Queries2", "C Matches", {"Ref"}, {"C Matches.Ref"}),
    #"Added Custom" = Table.AddColumn(#"Expanded C Matches", "Ref", each [A Matches.Ref] ?? [B Matches.Ref] ?? [C Matches.Ref]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"A Matches.Ref", "B Matches.Ref", "C Matches.Ref"})
in
    #"Removed Columns"

 

 

 

 

Method 2:

Method 2 uses a custom function which uses Table.SelectRows to return the value I need. I thought this would be faster but I can't even get it to work on more than a 300 rows. I have tried Table.Buffer() on both table1 and/or table2 and it still won't complete.  I have also tried Table.AddKey() which didn't do much either.

 

 

 

let
    Source = #"Table 2",
    lookup = (a as text, b as text, c as text) as list => 
        let 
            
            w = Table.SelectRows(#"Table 1", each Text.Lower([A])=Text.Lower(a)),
            x = Table.SelectRows(#"Table 1", each Text.Lower([B])=Text.Lower(b)),
            y = Table.SelectRows(#"Table 1", each Text.Lower([C])=Text.Lower(c)),
            z = if Table.RowCount(w) > 0 then w[Ref] else if Table.RowCount(x) > 0 then x[Ref] else if Table.RowCount(y) > 0 then y[Ref] else {}

        in  
           z,

    #"Added Custom" = Table.AddColumn(Source, "ID", each lookup([A], [B], [C])),
    #"Expanded ID" = Table.ExpandListColumn(#"Added Custom", "ID")

in
    #"Expanded ID"

 

 

 

 

Does anyone have any thoughts on anything else I can try or something obvious I may have missed. FYI, my real use case has 6 merges but I think the principle should still apply.

 

 

10 REPLIES 10
Anonymous
Not applicable

Why not break the lookup table into three separate one column tables, then do the joins?

 

--Nate

I do that in my real world case using Table.SelectColumns() but didn't show it for the example.

lasersharks
Kudo Collector
Kudo Collector

Thanks @smpa01  I'll give it a go tomorrow and see if there is any improvement.

smpa01
Super User
Super User

  I don't know if this going to help you but give this a try.  

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
lasersharks
Kudo Collector
Kudo Collector

Thanks @AlexisOlson  I have tried with normal if then else statements and it doesn't improve things much. Method 1 works and I even have a setup where I filter subsequent merges if a match has already happened but it isn't fast by any measure. I was wondering if there was some other technique out there I hadn't considered.

You could also try unpivoting both tables before merging.

 

I don't know if this will be any faster but you can try something like this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjRU0lFSyMvPK04FIgU4QBZVitWJVoJzdJScjHDKYtEN4iQbo5uhgJUNUgRUi08aSSIZtzvQjHA2RJdNMoQ6KxYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"A", type text}, {"B", type text}, {"C", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Column", "Value"),
    #"Uppercased Text" = Table.TransformColumns(#"Unpivoted Columns",{{"Value", Text.Upper, type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Uppercased Text", {"Column", "Value"}, Table1, {"Column", "Value"}, "Table", JoinKind.LeftOuter),
    #"Expanded Table" = Table.ExpandTableColumn(#"Merged Queries", "Table", {"Ref"}, {"Ref"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Table", each ([Ref] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Index"}, {{"Ref", each List.First([Ref]), type text}}),
    #"Merged Queries1" = Table.NestedJoin(#"Added Index", {"Index"}, #"Grouped Rows", {"Index"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries1", "Grouped Rows", {"Ref"}, {"Ref"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Grouped Rows",{"Index"})
in
    #"Removed Columns"

Thanks @AlexisOlson . I will give it a try tomorrow and see if the performance improves.

AlexisOlson
Super User
Super User

I'd expect your Method 1 to work OK, although I think your coalesce ("??") is probably breaking the query folding. I'd suggest editing it to use if/then rather than ?? to see if that makes any difference (since PQ knows how to turn if/then into a SQL case statement).

 

 

 

smpa01
Super User
Super User

@lasersharks  which one is the SQL table ? Table 1/Table2?

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

@smpa01   it is Table 1. What are you thinking?

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors