Optimizing Fuzzy Merge

akg · ‎06-09-2020

Is there a way to make the fuzzy merge feature place higher importance on matching rows using the transformation table, then by using it's algorithm? I believe this would help me solve the issue I'm seeing, but am open to trying different things.

I have two queries for which I have created a "bridge" to avoid a many-to-many relationship between rows. The data represents units sold and cases opened to support different account names, and unfortunately these two sources have been set up so that not every account name is guaranteed to be spelled the same in every instance...for example:

Query 1 Sales Account Names	Query 2 Case Account Names	Desired Visualisation Account Name
East Communications	East Communications, Inc.	East Communications
East Communications, Inc	East Comm.	East Communications
East Communications Inc.	East Communications Inc	East Communications
Freeflight	Freeflight (Bennington)	Freeflight
Freeflight	Freeflight (IT)	Freeflight

I have tried fuzzy merging the information in my bridge query with one of the original queries to assign each row the desired account name, but even when using a transformation table and playing around with the threshold, it does not always link the same account name. There are rows where the match has correctly assigned the desired name, but for the same account names there will also be rows that have defaulted to the original, show an incorrect name, or even come up as null.

v-xuding-msft · ‎06-10-2020

Hi @akg ,

I have tested the sample data. It works using transformation table. You could download my sample to have a try.

Reference:

Fuzzy Matching in Power BI and Power Query; Match based on Similarity Threshold

If your tables still can't work fine, maybe it is caused by the sample data is too simple. Please share a dummy pbix file that we can test.

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

akg · ‎06-10-2020

@v-xuding-msft

Here is a sample of my .pbix in Onedrive:

https://1drv.ms/u/s!As6Gm7qZSazFmSdN0X_5Z7qeNoBp?e=EnJrXJ

I've taken few different account names I was having trouble with along with some other random names to try and represent the dataset and created the bridge along with fuzzy merging. I have removed the duplicates so that the relationships will work, but if you roll it back you can see where the names arent lining up properly.

In this sample every account name has a match in the Fuzzy Match transformation table I loaded, but this is not always the case. My original transformation table only really includes matches for the obvious account names that I want matched. For example, I have ATT West in the "from" column of the fuzzy match transformation table, but this wouldn't normally be included in that table at all.