Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
kellyylx
Helper I
Helper I

Fuzzy duplicates in one column

Hi I have a column called names with fuzzy duplicates and I am hoping to be able to group them together. I am aware of fuzzy matching in power bi but i do not have a list of correct names to merge the tables together. 

 

For example: 

 

namegroup number 
Company Waterfall 1
Company Waterfll1
Company Waterfalls1
Hello Centre0
A. R. Company2
AR Company 2

 

Ideally those similar companies can be allocated an arbitrary group number or just any form of indication that separates each group.

 

kellyylx_0-1725269972760.png

I saw this potential solution from: https://www.mrexcel.com/board/threads/identify-fuzzy-matches-in-one-column.1211554/

 

but im not sure how i can do so in power bi! Appreciate any help possible 🙂 Thank you

 

 

 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @kellyylx ,

 

Is the question solved? Try the following steps:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs7PLUjMq1QITyxJLUpLzMlRUIrVQRfOycEiCFRbDBb2SM3JyVdwTs0rKUoFCzjqKQTpKUBVQ0SCYFyg8bEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [name = _t]),
    #"Added Custom1" = Table.AddColumn(Source, "name2", each Text.Replace(Text.Replace([name],".","")," ",""),type text),
    #"Merged Queries" = Table.FuzzyNestedJoin(#"Added Custom1", {"name2"}, #"Added Custom1", {"name2"}, "Source", JoinKind.LeftOuter, [IgnoreCase=true, IgnoreSpace=true, Threshold=0.5]),
    #"Removed Other Columns" = Table.Distinct(Table.ReplaceValue(Table.SelectColumns(#"Merged Queries",{"Source"}),each [Source], each Table.Sort([Source],{{"name", Order.Ascending}}),Replacer.ReplaceValue,{"Source"})),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "count", each Table.RowCount([Source])),
    #"Sorted Rows" = Table.Sort(#"Added Custom",{{"count", Order.Ascending}}),
    #"Added Custom2" = Table.AddColumn(#"Sorted Rows", "number", each if [count] = 1 then 0 else 1),
    #"Added Index" = Table.AddIndexColumn(#"Added Custom2", "Index", 1, 1, Int64.Type),
    #"Added Custom3" = Table.AddColumn(#"Added Index", "group number", each List.Sum(List.FirstN(#"Added Custom2"[number],[Index]))),
    #"Expanded Source" = Table.ExpandTableColumn(#"Added Custom3", "Source", {"name"}, {"name"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Source",{"number", "Index"})
in
    #"Removed Columns"

vcgaomsft_0-1725864539539.png

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum

View solution in original post

7 REPLIES 7
kellyylx
Helper I
Helper I

hi thanks so much for your help but is it possible to do something like if the row has no other duplicates, the row is in group 0. so group 0 are the unique rows and group 1 onwards are rows with duplicates?

You can check the count of items in each group and then bunch all the single items in a "Others"  group but that would somehow defeat the purpose of the fuzzy match, no?

Hi, how can i go about counting the items in each group? I used group by index to count the number of rows for each index in another table but is unable to merge the 2 tables together. 

 

this would be my ideal output:

 

namegroup number count
Company Waterfall 13
Company Waterfll13
Company Waterfalls13
Hello Centre01
A. R. Company22
AR Company 22
Anonymous
Not applicable

Hi @kellyylx ,

 

Is the question solved? Try the following steps:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs7PLUjMq1QITyxJLUpLzMlRUIrVQRfOycEiCFRbDBb2SM3JyVdwTs0rKUoFCzjqKQTpKUBVQ0SCYFyg8bEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [name = _t]),
    #"Added Custom1" = Table.AddColumn(Source, "name2", each Text.Replace(Text.Replace([name],".","")," ",""),type text),
    #"Merged Queries" = Table.FuzzyNestedJoin(#"Added Custom1", {"name2"}, #"Added Custom1", {"name2"}, "Source", JoinKind.LeftOuter, [IgnoreCase=true, IgnoreSpace=true, Threshold=0.5]),
    #"Removed Other Columns" = Table.Distinct(Table.ReplaceValue(Table.SelectColumns(#"Merged Queries",{"Source"}),each [Source], each Table.Sort([Source],{{"name", Order.Ascending}}),Replacer.ReplaceValue,{"Source"})),
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "count", each Table.RowCount([Source])),
    #"Sorted Rows" = Table.Sort(#"Added Custom",{{"count", Order.Ascending}}),
    #"Added Custom2" = Table.AddColumn(#"Sorted Rows", "number", each if [count] = 1 then 0 else 1),
    #"Added Index" = Table.AddIndexColumn(#"Added Custom2", "Index", 1, 1, Int64.Type),
    #"Added Custom3" = Table.AddColumn(#"Added Index", "group number", each List.Sum(List.FirstN(#"Added Custom2"[number],[Index]))),
    #"Expanded Source" = Table.ExpandTableColumn(#"Added Custom3", "Source", {"name"}, {"name"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Source",{"number", "Index"})
in
    #"Removed Columns"

vcgaomsft_0-1725864539539.png

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum

i have to filter out the potential duplicates thats why i wanted to have all the unique values in the same group. 

lbendlin
Super User
Super User

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs7PLUjMq1QITyxJLUpLzMlRUIrVQRfOycEiCFRbDBb2SM3JyVdwTs0rKUoFCzjqKQTpKUBVQ0SCYFyg8bEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [name = _t]),
    #"Merged Queries" = Table.FuzzyNestedJoin(Source, {"name"}, Source, {"name"}, "Source", JoinKind.LeftOuter, [IgnoreCase=true, IgnoreSpace=true, Threshold=0.5]),
    #"Removed Other Columns" = Table.Distinct(Table.ReplaceValue(Table.SelectColumns(#"Merged Queries",{"Source"}),each [Source], each Table.Sort([Source],{{"name", Order.Ascending}}),Replacer.ReplaceValue,{"Source"})),
    #"Added Index" = Table.AddIndexColumn(#"Removed Other Columns", "Index", 0, 1, Int64.Type),
    #"Expanded Source" = Table.ExpandTableColumn(#"Added Index", "Source", {"name"}, {"name"})
in
    #"Expanded Source"

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done". Once you examined the code, replace the Source step with your own source.

 

lbendlin_0-1725273208914.png

 

 

You will need to play with the threshold to get a better match.

#"Removed Other Columns" = Table.Distinct(Table.ReplaceValue(Table.SelectColumns(#"Merged Queries",{"Source"}),each [Source], each Table.Sort([Source],{{"name", Order.Ascending}}),Replacer.ReplaceValue,{"Source"})),

 

Hi can you explain to me what this line does and why we need to have it?

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.