Solved: Dynamic Merge Query with Ability to Iteratively Re...

Anonymous · ‎12-14-2023

I have a fun challenge for anyone interested...

I have a long list of queries that are dependent upon each other. It starts with 2 independent queries that are simply each importing a table. Then, those two queries are joined together via Merge as New and "expanding" the nested table column which the 2nd query resided in.

The first challenge is that Power BI hard-codes the column names, so I needed to make that merge dynamic, since changes are often made to the 2 queries being joined and need to be reflected in the merge column. 2 things should be noted - 1, the primary and foreign key have different names (which is also for the most part true among all the queries following) and 2, there are duplicate columns in query 1 and query 2. Therefore, in this statement, I included a clause to mark any duplicate columns in query 2 that exist in query 1 with ".1" appended to its name:

let
Source = Table.NestedJoin(query1, {"PrimaryKey"}, query2, {"ForeignKey"}, "query2", JoinKind.LeftOuter),
Expanded = Table.ExpandTableColumn(Source, "query2", Table.ColumnNames(query2), List.Transform(Table.ColumnNames(query2) as list, each if List.Contains(Table.ColumnNames(Source) as list, _) then _ & ".1" else _))
in
Expanded

Now this worked great, until I got further downstream to other merge queries. More independent tables are brought in, merged to the merge, and merges are even merged to merges. The problem I ultimately ran into is that "column.1" already exists, which broke the subsequent queries.

Basically, what I'm looking for is a method to iteratively rename duplicate columns within my dynamic merge statement. I want to replicate how clicking "expand" on a nested table column handles renaming duplicate columns with .1, .2, .3, .4, etc... I'm thinking maybe a recursive function, but I can't get that to work.

Ultimately, the code would need to check for any duplicate columns amongst the 2 queries being merged. If a duplicate is identified but hasn't yet been marked with .#, .1 should be appended to the column name of the 2nd query column. Similarly, if a duplicate is identified and it ends with .1, it should then be changed to .2. And so on and so forth. I know there is some way to do this, probably with some sort of iterative loop, but it is beyond my expertise.

Is there any way to accomplish this? Bonus points if you can write me working code. Thank you very much in advance!

spinfuzer · ‎12-15-2023

The below is a recrusive approach which is slower, but will increment it properly I believe. I'm pretty sure you're going to need to create a custom function for this one, or at least paste the steps separately into a query. I would just make a blank query with the custom function.

If the table that is being expanded already has duplicates, it will change the duplicate number for those duplicates.

Left Table Example: ABC, D.X

Right Table: ABC, ABC.1, ABC.2, E, D.X

Left Merged with Right Table Twice.

BLANK QUERY name: recursive_rename (you do not need to edit this one at all).

(left_table as table, right_table as table) =>
let
    left_names = Table.ColumnNames(left_table),
    right_names = Table.ColumnNames(right_table),
    check_exists = (name as text, name_list as list) =>
        let
            rename = if not List.Contains(name_list, name, Comparer.OrdinalIgnoreCase) then name
            else if try (Number.From(Text.AfterDelimiter(name, ".", {0, RelativePosition.FromEnd}) ?? 0) > 0) otherwise false
            then Text.BeforeDelimiter(name, ".", {0, RelativePosition.FromEnd}) & "." & Text.From(  (Number.From(Text.AfterDelimiter(name, ".", {0, RelativePosition.FromEnd})) ?? 0)  + 1)
            else name & "." & Text.From(try (Number.From(Text.AfterDelimiter(name, ".", {0, RelativePosition.FromEnd})) ?? 0) + 1 otherwise 1),
            recheck = if not List.Contains(name_list,rename) then rename else @ check_exists(rename,name_list)
        in
            recheck,
    transform_names =
        List.Generate( () => 
            [ 
                n = 0, 
                current_name = check_exists(right_names{n},left_names), 
                build_list = left_names & {current_name} 
            ],
            each [n] < List.Count(right_names),
            each 
                [ 
                    n = [n] + 1 , 
                    current_name = check_exists(right_names{n},[build_list]), 
                    build_list = [build_list] & {current_name}  
                ],
            each [current_name]
        )
in
    transform_names

example using the custom function above

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQx0jFW0lFSitWJVjKCMSAisQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ABC = _t, D.X = _t]),
    Table2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUoEYiiK1YlWMgIyklCFjIGMZCShWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ABC = _t, E = _t, D.X = _t, ABC.1 = _t, ABC.2 = _t]),
    #"Merged Queries" = Table.NestedJoin(Source, {"ABC"}, Table2, {"ABC"}, "ChangedType", JoinKind.LeftOuter),
    #"Expanded ChangedType" = 
        Table.ExpandTableColumn(
            #"Merged Queries", 
            "ChangedType", 
            Table.ColumnNames(Table2), 
            recursive_rename(#"Merged Queries",Table2)
        ),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded ChangedType", {"ABC"}, Table2, {"ABC"}, "Table2", JoinKind.LeftOuter),
    expand2 =     
        Table.ExpandTableColumn(
            #"Merged Queries1", 
            "Table2",
            Table.ColumnNames(Table2), 
            recursive_rename(#"Merged Queries1",Table2) 
        )
in
    expand2

After 1st Expand

After 2nd Expand

View solution in original post

spinfuzer · ‎12-14-2023

Maybe something like below. Be careful if you have added a prefix to column names after any merge as this will not check for your prefixes.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQx0jFWitWJVjICk0B2LAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Table2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWJVjICspLALGMgK1kpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Merged Queries" = Table.NestedJoin(Source, {"Column1"}, Table2, {"Column1"}, "ChangedType", JoinKind.LeftOuter),
    rename_duplicates = (left_table as table, right_table as table) =>
        List.Transform(
            Table.ColumnNames(right_table), 
            each _
            & 
            ("." 
                & Text.From(
                    List.Sum(
                        List.Repeat({1},
                            List.Count(
                                List.Select(
                                    Table.ColumnNames(left_table), 
                                    (x) => _ = Text.BeforeDelimiter(x,".", {0,RelativePosition.FromEnd}) or _ = x
                                )
                            ) 
                        )
                    )
                ) ?? ""
            )
        ),
    #"Expanded ChangedType" = 
        Table.ExpandTableColumn(
            #"Merged Queries", 
            "ChangedType", 
            Table.ColumnNames(Table2), 
            rename_duplicates(#"Merged Queries",Table2) 
        ),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded ChangedType", {"Column1"}, Table2, {"Column1"}, "Table2", JoinKind.LeftOuter),
    expand2 =     
        Table.ExpandTableColumn(
            #"Merged Queries1", 
            "Table2",
            Table.ColumnNames(Table2), 
            rename_duplicates(#"Merged Queries1",Table2) 
        )
in
    expand2

Anonymous · ‎12-14-2023

Thank you so much for your reply, I really appreciate it! How would I go about testing this on my own, as in what should I replace in your proposed query with my actual query / table names? Lastly, what's up with "i45WMlTSUUpUitWJVjICspLALGMgK1kpNhYA" in your code? Is this a call to some sort of library, or were you testing your own data?

spinfuzer · ‎12-14-2023

Copy and paste the entire code into a blank query to see a full example.

Source and Table2 are two separate tables.

The main thing is the rename duplicates function / step.

You can put that in the same query if it is only used in a single query or you can create a blank query and make it a custom function accessible to any query.

In the fourth argument you use rename_duplicates function instead of your table names so it can rename the duplicates for you.

Anonymous · ‎12-14-2023

Do you think the following code is feasible, if not how would you revise it? Really grasping at straws here but interested in your input. Hesitant to try your code without knowing the ins and outs for now.

= Table.ExpandTableColumn(
Source,
"query2",
List.Transform(
Table.ColumnNames(query2),
each _ & (
"." & Text.From(
List.Sum(
List.Repeat(
{1},
List.Count(
List.Select(
Table.ColumnNames(Source),
(x) => _ = Text.BeforeDelimiter(x, ".", {0, RelativePosition.FromEnd}) or _ = x
)
)
)
) ?? 0
)
)
)
)

---------------------------------------------------------------------------------

For reference, Source = Table.NestedJoin(query1, {"PrimaryKey"}, query2, {"ForeignKey"}, "query2", JoinKind.LeftOuter),

spinfuzer · ‎12-14-2023

Looks like you copied the rename duplicates function and put it directly in the Table.ExpandTableColumn function. That should work, except you would have to type that all over every time.

The function saves the copy and pasting over and over and lets you write rename_duplicates(Source,query2) instead.

Anonymous · ‎12-15-2023

The code is working perfectly, but I'm running into errors in cases like the following:

Query 1- A prior join table that has column, column.1, and column.2
Query 2- An import table that has column

When using the new merge code I get the error: "column.1 already exists" - I think this is because the code sees column = column, promotes the column in Query 2 to column.1, and then an error is thrown because now there are 2 column.1's in the merge.

Is there any way to modify the code to get around this? Thank you again so much for all of your help!

spinfuzer · ‎12-15-2023

If it is only duplicates in one table it should already work. Are you sure you are using the table with duplicates names and comparing it against another table with no duplicates?

or are you saying the below

Table1: Column, Column.1, Column.2

Table2: Column, Column.1, Column.2

So both tables have duplicates of the same column?

What the original code is doing is that it takes out the suffixes of Table1 and counts the number of columns with the same name and then adds one to that count. It does not count the number of duplicates in table2 because it assumes no duplicates in that table. This would have to be changed if you are saying there are duplicates in table2 as well as table1.

Anonymous · ‎12-15-2023

Both tables could have duplicates. In the example above, Table 1 had Column, Column.1, and Column.2 while Table 2 had Column, causing an error when Column was promoted to Column.1, since Column.1 already exists in Table 1.

My apologies for not making this more clear in my initial prompt, it's been a doozy of a structure to wrap my head around!

Anonymous · ‎12-15-2023

To add to that, in future joins there could be a scenario like you described. It could be something like trying to merge Table 1: Column, Column.1, Column.2 with Table 2: Column, Column.1 which would throw 2 errors when Column is promoted to Column.1, and another when Column.1 is promoted to Column.2

spinfuzer · ‎12-15-2023

How about the below? It breaks the sequencing, but I think it ensures no duplicates.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQx0jFW0lECoVidaCUjJDZcPBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Custom.X = _t, Custom.Y = _t]),
    Table2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUoEYiiK1YlWMgIyklCFjIGMZCShWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Custom.X = _t, Column1.1 = _t, Column1.2 = _t]),
    #"Merged Queries" = Table.NestedJoin(Source, {"Column1"}, Table2, {"Column1"}, "ChangedType", JoinKind.LeftOuter),
    rename_duplicates = (left_table as table, right_table as table) =>
        List.Transform(
            Table.ColumnNames(right_table), 
            each _
            & 
            ("." 
                & Text.From(        
                    List.Max(
                        List.Transform(
                            List.Select(
                                List.Distinct(Table.ColumnNames(left_table) & Table.ColumnNames(right_table)), 
                                (x) => 
                                    (
                                        _ = Text.BeforeDelimiter(x,".", {0,RelativePosition.FromEnd})
                                        and (Number.From(Text.AfterDelimiter(x,".", {0,RelativePosition.FromEnd}) ?? 0) > 0
                                    ) 
                                    or _ = x
                                    )
                                        
                            ),
                            (y) =>  try Number.From(Text.AfterDelimiter(y,".", {0,RelativePosition.FromEnd})) otherwise 0
                        ) 
                    )+1
                ) ?? ""
            )
        ),
    #"Expanded ChangedType" = 
        Table.ExpandTableColumn(
            #"Merged Queries", 
            "ChangedType", 
            Table.ColumnNames(Table2), 
            rename_duplicates(#"Merged Queries",Table2) 
        ),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded ChangedType", {"Column1"}, Table2, {"Column1"}, "Table2", JoinKind.LeftOuter),
    expand2 =     
        Table.ExpandTableColumn(
            #"Merged Queries1", 
            "Table2",
            Table.ColumnNames(Table2), 
            rename_duplicates(#"Merged Queries1",Table2) 
        )
in
    expand2

Anonymous · ‎12-15-2023

I really appreciate your reply so much, and I'd love to test this out. Is there any way to incorporate these new changes into the current statement below that I've been using? I'm fine with copying and pasting a bunch and know it's not the ideal way, but I just need to get this to work:

=Table.ExpandTableColumn(Source, "query2", Table.ColumnNames(#"query2"),

List.Transform(Table.ColumnNames(#"query2"), each if List.Contains(Table.ColumnNames(Source),

_, Comparer.OrdinalIgnoreCase) and List.Contains(Table.ColumnNames(#"query2"), _, Comparer.OrdinalIgnoreCase)

then _ & ("." & Text.From(List.Sum(List.Repeat({1}, List.Count(List.Select(Table.ColumnNames(#"query2"),

(x) => _ = Text.BeforeDelimiter(x, ".", {0, RelativePosition.FromEnd}) or _ = x)))) ?? 0)) else _))

Where Source = Table.NestedJoin(#"query1", {"PrimaryKey"}, #"query2", {"ForeignKey"}, "query2", JoinKind.LeftOuter)

*Note: Sometimes the join kind is right, right/left anti, full outer, inner

spinfuzer · ‎12-15-2023

stick the below into your 4th argument of Table.ExpandTableColumn. I added the case insensitive check into the below already so you can remove that ignore case part of your code. Replace right table and left table portions.

This happens to suffix everything with a .1 and increments it regardless of it was a duplicate or not. If you do not want that we would have to filter out values that do not already exist in Left Table when we combine Table names from Left Table and Right Table (around the List.Distinct part).

        List.Transform(
            Table.ColumnNames(right_table), 
            each _
            & 
            ("." 
                & Text.From(        
                    List.Max(
                        List.Transform(
                            List.Select(
                                List.Distinct(Table.ColumnNames(left_table) & Table.ColumnNames(right_table)), 
                                (x) => 
                                    (
                                        Comparer.OrdinalIgnoreCase(_,Text.BeforeDelimiter(x,".", {0,RelativePosition.FromEnd})) = 0
                                        and (Number.From(Text.AfterDelimiter(x,".", {0,RelativePosition.FromEnd}) ?? 0) > 0
                                    ) 
                                    or Comparer.OrdinalIgnoreCase(_,x) = 0
                                    )
                                        
                            ),
                            (y) =>  try Number.From(Text.AfterDelimiter(y,".", {0,RelativePosition.FromEnd})) otherwise 0
                        ) 
                    )+1 
                ) ?? ""
            )
        )

Although I really think the easiest thing to do is just to keep a single column, update that column to "old value(s), new value" and then remove the duplicate column. When all is said and done you'll have value1, value2, value3, ..., value n in a single clean column. From there you can split columns by delimiter easily if you want to AND the values will already be sorted next to each other. Let me know what you think of the alternative.

Anonymous · ‎12-15-2023

It looks like the code works, except it's appending .# to an already existing .# rather than replacing it with the updated .# -- For example, in my result set I'm seeing "Column.1.2," "Column.2.3," "Column.3.4".

To your alternative, admittedly I'm having a hard time wrapping my head around what that entails.

spinfuzer · ‎12-15-2023