Solved: Re: Identify "very-close" duplicates

mbegg · ‎07-25-2017

I have a list of 19,000 business names with some "very-close" duplicates.

E.g. XYZ Pty Ltd and XYZ Pty. Ltd. or ABCDE and ABCD

There is no logic to the differences so I can't just find & replace all the . from Pty. Ltd. and fix all of the duplicates.

Is there a way to identify the "very-close" duplicates. I am thinking of function that would identify if the current value is the same as another value in the list except for 1 or 2 or 3 or x characters.

v-sihou-msft · ‎07-27-2017

@mbegg

Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.

Regards,

View solution in original post

v-sihou-msft · ‎07-27-2017

@mbegg

Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.

Regards,

Faye1901 · ‎08-05-2019

This isn't a solution! While Fuzzy match in PBI has been great, it doesn't handle fuzzy duplicates in a single column and therefore this post is not solved.

mbegg · ‎07-27-2017

@v-sihou-msft

I now know that what I was trying to describe is called "fuzzy match" in the data analytics space. I will add this as a development idea

Identify "very-close" duplicates

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly

Jumpstart your career with the Fabric Career Hub

Identify "very-close" duplicates

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly