Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have to do a complex fuzzy join between two tables involving multiple columns and weird conditions. As you can imagine performance is not great, and I am trying to find ways to optimize the step. Adding keys and indexes won't be of any help (I think) and Table.Buffer is already used on the lookup table. Now I am experimenting with grabbing the current scope at different points in the nesting.
Version 1:
A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
#"Added Custom" = Table.AddColumn(#"Previous Step", "Match",
(k) => Table.SelectRows(A,
each ([L]="*" or k[L]=[L])
and ([P]="*" or k[P]=[P])
and ([S]="*" or k[S]=[S])
)
),
Version 2:
A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
#"Added Custom" = Table.AddColumn(#"Previous Step", "Match",
each Table.SelectRows(A,
(k) => ([L]="*" or k[L]=[L])
and ([P]="*" or k[P]=[P])
and ([S]="*" or k[S]=[S])
)
),
I had thought version 2 would perform better but it actually is two times slower. I don't understand why. I'll check the query diagnostics but this will take a lot of time (version 1 takes 45 minutes with about 500K rows on the left table and 5K rows on the right table). So maybe someone can explain the difference in impact?
@ImkeF G'sunds Neues!
i guess:
1 function of or, when the left get a true then it will not calculate the right part.
2 function of and, when left part get a false then it will not calculate the right part.
so, the difference is how many times the formula to retrieve values from A.
Hi Lutz,
have you checked that the results are actually identical?
I would have guessed that Version 2 must look like this to return the same results:
A = Table.Buffer(Table.SelectColumns(B,{"U","L","P","S"})),
#"Added Custom" = Table.AddColumn(#"Previous Step", "Match",
each Table.SelectRows(A,
(k) => (k[L]="*" or k[L]=[L])
and (k[P]="*" or k[P]=[P])
and (k[S]="*" or k[S]=[S])
)
),
(... have added a "k" in front of the column selectors that compare against the "*".)
Frohes Neues ebenso 🙂
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Imke,
no, haven't actually checked the results (beyond comparing the run times). So it is well possible that option 2 has a totally different result because the viewpoint has changed from the left table to the right table. I guess I'll have to cut down my sample set somewhat to make the tests manageable.
In parallel I am backporting most of the logic into SQL Server (I hadn't even mentioned that "#Previous Step" was the result of an earlier merge between a SQL Server source and a Sharepoint based CSV source ...) . It's not lightning fast either but already orders of magnitude faster than Power Query for this particular problem.
Why can't Power Query be like Vertipaq and do everything in memory? 🙂
Yes, you have to use special techniques to ensure that combinations with non-SQL-sources actually fold:
SQL-query folding bug still alive and sucking in PowerBI and PowerQuery in Excel – The BIccountant
Another option to speed it up could be to rearange the logic so that you can work with a merge on multiple columns instead. But this of course depends on the actual filter requirement.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
I bet it's all of those "Or" statements slowing you down; it's gotta look for those strings for each column, each iteration. Instead of the "or"s, you could use List.Contains, and buffer the lists:
each Table.SelectRows(A, each List.Contains(List.Buffer({"*", [L]}}, [L])) and List.Contains(List.Buffer({"*", [P]}, [P])) and List.Contains(List.Buffer({"*", [S]}, [S])))
--Nate
The or statements are selected specifically to encourage lazy streaming. If the field is a wildcard (represented by the "*" value) then evaluation should ideally stop for that comparison. Not sure if List.Contains is optimized in a similar way, but I'll give it a try.
My actual question was about a different aspect.
Table.Addcolumn(LeftTable, (scope)=> Table.SelectRows(RightTable, each complex condition involving scope))
or
Table.Addcolumn(LeftTable, each Table.SelectRows(RightTable, (scope)=> complex condition involving scope))
As I understand it buffers only make sense when you re-use the data. That's not applicable for this join.
Nate,
neither source is foldable so the change in SQL statement caused by List.Buffer doesn't matter in this scenario (but I'll keep that in mind for other scenarios).
None of the standard Table.NestedJoin etc functions can be used for multiple join columns and wildcard joins - Table.AddColumn is the only option as far as I know.
The actual business logic is even crazier with partially ragged hierarchies and recursions.
I ended up implementing the logic in SQL Server directly after grudgingly importing my second source into SQL Server (it is originally a CSV file in a sharepoint). Same idea as in Power Query but performance is acceptable (low tens of minutes for 2.2M resulting rows, with continuous spooling of results) - even applying your List.Contains() approach a little bit 🙂
Select [Key], COALESCE([uid],'Unassigned') [U]
from k
left join a on a.[L] in ('*',k.L) and a.[P] in ('*',k.[P]) and a.[S] in ('*',k.[S])
where not [Key]='*|*|*'
Check out the July 2025 Power BI update to learn about new features.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
10 | |
7 | |
7 | |
6 | |
6 |