March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi all
Can Power BI's DAX language help determine how similar or different the names of thousands of customers are in a single column? The names are entered in various formats with different spacing or punctuation.
By calculating a similarity percentage, I want to identify which names are completely unique and which ones are similar to others.
This will allow me to manually investigate and correct the similar names.
I need to add , that this column exist in an Excel file , so maybe we could implement this out Power Query , if some one know another way , please share it her 🙂
Thanks in advance
Regards
Hi @Hercules78 ,
I created some data:
In DAX/Power Query, we can't check the result of each character in two columns in comparison, for example, ABC and ACB, if you don't consider this character order, you can try the following:
1. Add Column.
=Text.ToList([Group2])
Extends the value to the current table.
if Text.Contains([Group],[Name2 List]) then 1 else 0
Group BY:
Text.Length([Group])
[Match]/[Custom]
Set [Custom.1] -- Change Type -- Percentage
2. Result:
This is the related document, you can view this content:
Fuzzy Clustering in Power BI using Power Query: Finding similar values - RADACAD
Best Regards,
Liu Yang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Thanks a lot
I just want to add that I dont have 2 columns , I have only 1 column and I want to check similarity betweens rows !!
Thansk again
There are scripts to implement Levenshtein in SQL. Feel free to adapt these for Power Query.
You can also do a Pearson correlation of a column against itself.
There is no official SOUNDEX or Levenshtein etc. implementation in DAX. You can consider running Python or R scripts inside of Power BI for that.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
125 | |
85 | |
69 | |
54 | |
45 |
User | Count |
---|---|
204 | |
105 | |
98 | |
65 | |
54 |