Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
lbendlin
Super User
Super User

Nested functions - unclear on scope impact

I have to do a complex fuzzy join between two tables involving multiple columns and weird conditions. As you can imagine performance is not great, and I am trying to find ways to optimize the step.  Adding keys and indexes won't be of any help (I think) and Table.Buffer is already used on the lookup table.  Now I am experimenting with grabbing the current scope at different points in the nesting.

Version 1:

 

    A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
    #"Added Custom" = Table.AddColumn(#"Previous Step", "Match", 
        (k) => Table.SelectRows(A,
            each ([L]="*" or k[L]=[L])
             and ([P]="*" or k[P]=[P])
             and ([S]="*" or k[S]=[S])
                )
            ),

 

 

Version 2:

 

    A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
    #"Added Custom" = Table.AddColumn(#"Previous Step", "Match", 
        each Table.SelectRows(A,
            (k) => ([L]="*" or k[L]=[L])
             and ([P]="*" or k[P]=[P])
             and ([S]="*" or k[S]=[S])
                )
            ),

 

 

I had thought version 2 would perform better but it actually is two times slower. I don't understand why.  I'll check the query diagnostics but this will take a lot of time (version 1 takes 45 minutes with about 500K rows on the left table and 5K rows on the right table).  So maybe someone can explain the difference in impact?  

 

@ImkeF G'sunds Neues!

8 REPLIES 8
wdx223_Daniel
Super User
Super User

i guess:

1 function of or, when the left get a true then it will not calculate the right part.

2 function of and, when left part get a false then it will not calculate the right part.

so, the difference is how many times the formula to retrieve values from A.

ImkeF
Community Champion
Community Champion

Hi Lutz,
have you checked that the results are actually identical?
I would have guessed that Version 2 must look like this to return the same results:

 

    A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
    #"Added Custom" = Table.AddColumn(#"Previous Step", "Match", 
        each Table.SelectRows(A,
            (k) => (k[L]="*" or k[L]=[L])
             and (k[P]="*" or k[P]=[P])
             and (k[S]="*" or k[S]=[S])
                )
            ),

 

(... have added a "k" in front of the column selectors that compare against the "*".)

 

Frohes Neues ebenso 🙂

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Imke,

 

no, haven't actually checked the results (beyond comparing the run times).  So it is well possible that option 2 has a totally different result because the viewpoint has changed from the left table to the right table.  I guess I'll have to cut down my sample set somewhat to make the tests manageable.

 

In parallel I am backporting most of the logic into SQL Server  (I hadn't even mentioned that "#Previous Step"  was the result of an earlier merge between a SQL Server source and a Sharepoint based CSV source ...) . It's not lightning fast either but already orders of magnitude faster than Power Query for this particular problem.

 

Why can't Power Query be like Vertipaq and do everything in memory? 🙂

ImkeF
Community Champion
Community Champion

Yes, you have to use special techniques to ensure that combinations with non-SQL-sources actually fold: 
SQL-query folding bug still alive and sucking in PowerBI and PowerQuery in Excel – The BIccountant

 

Another option to speed it up could be to rearange the logic so that you can work with a merge on multiple columns instead. But this of course depends on the actual filter requirement.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

I bet it's all of those "Or" statements slowing you down; it's gotta look for those strings for each column, each iteration. Instead of the "or"s, you could use List.Contains, and buffer the lists:

 

 each Table.SelectRows(A, each List.Contains(List.Buffer({"*", [L]}}, [L])) and List.Contains(List.Buffer({"*", [P]}, [P])) and List.Contains(List.Buffer({"*", [S]}, [S])))

 

--Nate

The or statements are selected specifically to encourage lazy streaming.  If the field is a wildcard (represented by the "*"  value) then evaluation should ideally stop for that comparison.  Not sure if List.Contains is optimized in a similar way, but I'll give it a try.

 

My actual question was about a different aspect.

 

Table.Addcolumn(LeftTable, (scope)=> Table.SelectRows(RightTable, each complex condition involving scope))

 

or

 

Table.Addcolumn(LeftTable, each Table.SelectRows(RightTable, (scope)=> complex condition involving scope))

 

As I understand it buffers only make sense when you re-use the data. That's not applicable for this join.

 

Anonymous
Not applicable

Oh—if your question is “why is the second query slower than the first?”, then I’d bet that in the second query, the (context) => is calculated for every iteration of “each Table.SelectRows(RightTable, complex filter statement)”
 
As opposed to the first query, where (context) has to be calculated before it can be part of “each complex filter statement”.
 
Regarding List.Contains, using a buffered list as the first parameter will yield an underlying SQL statement of IN (ABC, 123, Xyz, etc). Filtering SQL using a buffered list in List.Contains is a wonderful thing to behold, friend.
 
Finally, and please forgive me if I’ve forgotten, but can’t you just join these tables and be done with it?!
 
--Nate

Nate,

neither source is foldable so the change in SQL statement caused by List.Buffer doesn't matter in this scenario (but I'll keep that in mind for other scenarios).

None of the standard Table.NestedJoin etc functions can be used for multiple join columns and wildcard joins - Table.AddColumn is the only option as far as I know.

The actual business logic is even crazier with partially ragged hierarchies and recursions.

 

I ended up implementing the logic in SQL Server directly after grudgingly importing my second source into SQL Server (it is originally a CSV file in a sharepoint). Same idea as in Power Query but performance is acceptable (low tens of minutes for 2.2M resulting rows, with continuous spooling of results) - even applying your List.Contains() approach a little bit 🙂

 

 

Select [Key], COALESCE([uid],'Unassigned') [U]
from  k
left join  a on a.[L] in ('*',k.L) and a.[P] in ('*',k.[P]) and a.[S] in ('*',k.[S])
where not [Key]='*|*|*'

 

 

 

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors