Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

Reply
ianbruckner
Frequent Visitor

Merging with one exact match column and one fuzzy join column

Is it possitble to do a merge based on two columns... one that requires the first column selected to be an exact match, and on a second column with fuzzy join settings applied?

 

I'm trying to merge a dataset that, when there are overlaps, one of the columns should ALWAYS be an exact match (system date)... but the second column I'm matching on is free-form text field. I want to avoid fuzzy matches from that second column matching up to a result when the dates from the first column don't match. I've tried my best to rig it, by making the date field a Long-form text version of date, so the chances of a bad match because both columns are fuzzy matched being low, but I've found it not exact, and playing with the similarity threshold hasn't solved the problem. 

1 ACCEPTED SOLUTION
ImkeF
Community Champion
Community Champion

Hi @ianbruckner ,
I'm afraid, but the merge/join-condition can only be set on the join level and not on column-level.

 

For inner joins, you can simply do them one after another, but for outer joins you have to expand the matched columns and then apply some logic afterwards (taking only those rows, where both expansions returned values).

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

4 REPLIES 4
MateoSpack
New Member

I know this is old, but I came across this while searching for a solution that would work for a similar situation, but in my case I needed to limit the fuzzy match results to just the top result. I was inspired by @LaasyaS's suggestion to group by first. However, I couldn't perform the fuzzy match first as I needed a single fuzzy match result from among the exact match options. Thus, here is the final solution I found works very cleanly:

 

1) First, group your starting table (whichever one will include ALL records generally, but this will be the table that is not restricted to the n number of fuzzy match results) by all the join columns for EXACT matching. For the aggregation, select "All Rows" (I generally name this column "Data" and will going forward). This will leave you with all exact join columns and a column composed of sub-tables.

 

2) Left-join onto the other table using exact matching. This will give you a 2nd column of tables, each a subtable of all the exact-match options for your joining.

 

3) Finally, add a custom additional column and for the formula, perform the fuzzy join with the matches limit and everything. You can ignore the exact-match columns if you want, since those will be redundant, but this will perform a fuzzy join for each subset of the data partitioned by your exact match columns.

 

4) You can delete all the columns for the source tables and then expand your newest column to get all the data joined as desired. Here's an example of that with 2 tables called "Employees" and "Positions" where several companies' position definitions are fuzzy matched to job titles for employees missing their position code, but only using their own company's options.

 

Employees (ID | Name | Title | Company ID | Position ID)

Positions (ID | Position Desc | Company ID)

 

let
Source = Table.SelectRows(Employees, each [Position ID] = null),
#"Grouped Rows" = Table.Group(Source, {"Company ID"}, {{"Data", each _, type table [ID=nullable text, Name=nullable text, Title=nullable text, Company ID=nullable text, Position ID=nullable text]}}),
#"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Company ID"}, Positions, {"Company ID"}, "Positions", JoinKind.LeftOuter),
#"Added Custom" = Table.AddColumn(#"Merged Queries", "Company Positions", each Table.FuzzyNestedJoin([Data], {"Title"}, [Positions], {"Position Desc"}, "matches", JoinKind.LeftOuter, [SimilarityThreshold=0.8, IgnoreCase=true, IgnoreSpace=true, NumberOfMatches=1]), type table),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Company Positions"}),
#"Expanded Company Positions" = Table.ExpandTableColumn(#"Removed Other Columns", "Company Positions", {"ID", "Name", "Title", "Company ID", "matches"}, {"ID", "Name", "Title", "Company ID", "matches"}),
#"Expanded matches" = Table.ExpandTableColumn(#"Expanded Company Positions", "matches", {"ID", "Position Desc", "Company ID"}, {"matches.ID", "matches.Position ID", "matches.Company ID"})
in
#"Expanded Matches"

 

Hopefully that is helpful to someone in the future!

LaasyaS
Microsoft Employee
Microsoft Employee

Came across this thread when I had the same issue. 

 

Worked around by creating a separate intermediate mapping table. 
Group/Aggregate both the tables you are trying to compare by only 1 metric - the field that you are trying to fuzzy match. 
Then fuzzy match the 2 grouped tables to give you a conversion table. Play with the similarity etc as needed until you're satisfied with the result. 
Then join the conversion table to the first table to give you the field needed from the second table. 

 

Now you can do an exact match between the first and second table on both fields. 

ImkeF
Community Champion
Community Champion

Hi @ianbruckner ,
I'm afraid, but the merge/join-condition can only be set on the join level and not on column-level.

 

For inner joins, you can simply do them one after another, but for outer joins you have to expand the matched columns and then apply some logic afterwards (taking only those rows, where both expansions returned values).

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thanks, @ImkeF. Your answer of possible or not was helpful... and your solution of double merging is leading me to consider my datasets more closely to see if there might be other similarties I can code against.

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors