Re: Nested functions - unclear on scope impact

lbendlin · ‎12-29-2021

I have to do a complex fuzzy join between two tables involving multiple columns and weird conditions. As you can imagine performance is not great, and I am trying to find ways to optimize the step. Adding keys and indexes won't be of any help (I think) and Table.Buffer is already used on the lookup table. Now I am experimenting with grabbing the current scope at different points in the nesting.

Version 1:

    A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
    #"Added Custom" = Table.AddColumn(#"Previous Step", "Match", 
        (k) => Table.SelectRows(A,
            each ([L]="*" or k[L]=[L])
             and ([P]="*" or k[P]=[P])
             and ([S]="*" or k[S]=[S])
                )
            ),

Version 2:

    A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
    #"Added Custom" = Table.AddColumn(#"Previous Step", "Match", 
        each Table.SelectRows(A,
            (k) => ([L]="*" or k[L]=[L])
             and ([P]="*" or k[P]=[P])
             and ([S]="*" or k[S]=[S])
                )
            ),

I had thought version 2 would perform better but it actually is two times slower. I don't understand why. I'll check the query diagnostics but this will take a lot of time (version 1 takes 45 minutes with about 500K rows on the left table and 5K rows on the right table). So maybe someone can explain the difference in impact?

@ImkeF G'sunds Neues!

wdx223_Daniel · ‎12-29-2021

i guess:

1 function of or, when the left get a true then it will not calculate the right part.

2 function of and, when left part get a false then it will not calculate the right part.

so, the difference is how many times the formula to retrieve values from A.

ImkeF · ‎12-29-2021

Hi Lutz,
have you checked that the results are actually identical?
I would have guessed that Version 2 must look like this to return the same results:

    A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
    #"Added Custom" = Table.AddColumn(#"Previous Step", "Match", 
        each Table.SelectRows(A,
            (k) => (k[L]="*" or k[L]=[L])
             and (k[P]="*" or k[P]=[P])
             and (k[S]="*" or k[S]=[S])
                )
            ),

(... have added a "k" in front of the column selectors that compare against the "*".)

Frohes Neues ebenso 🙂

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

lbendlin · ‎12-29-2021

Imke,

no, haven't actually checked the results (beyond comparing the run times). So it is well possible that option 2 has a totally different result because the viewpoint has changed from the left table to the right table. I guess I'll have to cut down my sample set somewhat to make the tests manageable.

In parallel I am backporting most of the logic into SQL Server (I hadn't even mentioned that "#Previous Step" was the result of an earlier merge between a SQL Server source and a Sharepoint based CSV source ...) . It's not lightning fast either but already orders of magnitude faster than Power Query for this particular problem.

Why can't Power Query be like Vertipaq and do everything in memory? 🙂

ImkeF · ‎12-29-2021

Yes, you have to use special techniques to ensure that combinations with non-SQL-sources actually fold:
SQL-query folding bug still alive and sucking in PowerBI and PowerQuery in Excel – The BIccountant

Another option to speed it up could be to rearange the logic so that you can work with a merge on multiple columns instead. But this of course depends on the actual filter requirement.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎12-29-2021

I bet it's all of those "Or" statements slowing you down; it's gotta look for those strings for each column, each iteration. Instead of the "or"s, you could use List.Contains, and buffer the lists:

each Table.SelectRows(A, each List.Contains(List.Buffer({"*", [L]}}, [L])) and List.Contains(List.Buffer({"*", [P]}, [P])) and List.Contains(List.Buffer({"*", [S]}, [S])))

--Nate

lbendlin · ‎12-29-2021

The or statements are selected specifically to encourage lazy streaming. If the field is a wildcard (represented by the "*" value) then evaluation should ideally stop for that comparison. Not sure if List.Contains is optimized in a similar way, but I'll give it a try.

My actual question was about a different aspect.

Table.Addcolumn(LeftTable, (scope)=> Table.SelectRows(RightTable, each complex condition involving scope))

or

Table.Addcolumn(LeftTable, each Table.SelectRows(RightTable, (scope)=> complex condition involving scope))

As I understand it buffers only make sense when you re-use the data. That's not applicable for this join.

Anonymous · ‎12-31-2021

Oh—if your question is “why is the second query slower than the first?”, then I’d bet that in the second query, the (context) => is calculated for every iteration of “each Table.SelectRows(RightTable, complex filter statement)”

As opposed to the first query, where (context) has to be calculated before it can be part of “each complex filter statement”.

Regarding List.Contains, using a buffered list as the first parameter will yield an underlying SQL statement of IN (ABC, 123, Xyz, etc). Filtering SQL using a buffered list in List.Contains is a wonderful thing to behold, friend.

Finally, and please forgive me if I’ve forgotten, but can’t you just join these tables and be done with it?!

--Nate

lbendlin · ‎12-31-2021

Nate,

neither source is foldable so the change in SQL statement caused by List.Buffer doesn't matter in this scenario (but I'll keep that in mind for other scenarios).

None of the standard Table.NestedJoin etc functions can be used for multiple join columns and wildcard joins - Table.AddColumn is the only option as far as I know.

The actual business logic is even crazier with partially ragged hierarchies and recursions.

I ended up implementing the logic in SQL Server directly after grudgingly importing my second source into SQL Server (it is originally a CSV file in a sharepoint). Same idea as in Power Query but performance is acceptable (low tens of minutes for 2.2M resulting rows, with continuous spooling of results) - even applying your List.Contains() approach a little bit 🙂

Select [Key], COALESCE([uid],'Unassigned') [U]
from  k
left join  a on a.[L] in ('*',k.L) and a.[P] in ('*',k.[P]) and a.[S] in ('*',k.[S])
where not [Key]='*|*|*'

Nested functions - unclear on scope impact

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Nested functions - unclear on scope impact

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025