Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
I have a list of 19,000 business names with some "very-close" duplicates.
E.g. XYZ Pty Ltd and XYZ Pty. Ltd. or ABCDE and ABCD
There is no logic to the differences so I can't just find & replace all the . from Pty. Ltd. and fix all of the duplicates.
Is there a way to identify the "very-close" duplicates. I am thinking of function that would identify if the current value is the same as another value in the list except for 1 or 2 or 3 or x characters.
Solved! Go to Solution.
Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.
Regards,
Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.
Regards,
This isn't a solution! While Fuzzy match in PBI has been great, it doesn't handle fuzzy duplicates in a single column and therefore this post is not solved.
I now know that what I was trying to describe is called "fuzzy match" in the data analytics space. I will add this as a development idea
User | Count |
---|---|
81 | |
79 | |
71 | |
70 | |
54 |
User | Count |
---|---|
107 | |
99 | |
88 | |
79 | |
67 |