Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply

Column to show string groupings

Hi everyone, I'm hoping someone might be able to help. I'm looking to create a column that shows groupings of similarly-named strings.

 

The column I need should show unique numbers. These groupings represent 'buckets' of similarly-named strings from a 'File Name' column.

 

A grouping should be a collection of similar file names - similar to a fuzzy matching threshold of 0.9.

 

If there is a string that doesn't have any other matches, it should have a Grouping number of 0.

 

Here's a table with some sample data, and the 'Grouping' column I need:

File PathFile NameGrouping
https://sharepointsite.com/sites/review_library/filename_content_reviewAug2021.pdffilename_content_reviewAug2021.pdf1
https://sharepointsite.com/sites/review_library/filename_content_reviewAug2022.pdffilename_content_reviewAug2022.pdf1
https://sharepointsite.com/sites/review_library/filename_content_reviewAug2023.pdffilename_content_reviewAug2023.pdf1
https://sharepointsite.com/sites/policy_library/mypolicy.docmypolicy.doc2
https://sharepointsite.com/sites/content_library/mypolicy.docmypolicy.doc2
https://sharepointsite.com/sites/backup_library/mypolicy.docmypolicy.doc2
https://sharepointsite.com/sites/backup_library/strategy_document.pdfstrategy_document.pdf0
https://sharepointsite.com/sites/policy_library/community_review_plan.xlsxcommunity_review_plan.xlsx3
https://sharepointsite.com/sites/content_library/review plan for communities.docreview plan for communities.doc3
https://sharepointsite.com/sites/review_library/detailed plan for community review.xlsxdetailed plan for community review.xlsx3
https://sharepointsite.com/sites/review_library/office-layout-guide.docoffice-layout-guide.doc0

 

Note: The groupings I've suggested above might not exactly align to a similarity threshold of 0.9, I've used these groupings just as examples.

Any help would really be appreciated! 🙂

1 ACCEPTED SOLUTION
v-echaithra
Community Support
Community Support

Hi @MichaelHutchens ,

Create a copy of the table by duplicating the table in Power query.

 

Perform Fuzzy Merge - Self-Join

 

  • Go to Files table.
  • Home > Merge Queries > Merge Queries as New
  • Primary table: Table
  • Secondary table: Duplicated Table
  • Join on File Name using Fuzzy Matching:
  • Check Use fuzzy matching
  • Click Fuzzy matching options:
  • Similarity threshold = tweak as needed
  • check ignore case
  • Max matches: blank

 

In the merged table, remove duplicates so each pair appears only once by using Table.Distinct

 

Create a group ID table using DAX to assign a group ID to every file participating in a fuzzy match. You can then join it back to your original file list, and if any file name is not present in the Grouping table, assign that group to 0.

If this post helps, please give us Kudos and consider marking it Accept as solution to assist other members in finding it more easily.

Regards,
Chaithra

View solution in original post

6 REPLIES 6
v-echaithra
Community Support
Community Support

Hi @MichaelHutchens ,

Create a copy of the table by duplicating the table in Power query.

 

Perform Fuzzy Merge - Self-Join

 

  • Go to Files table.
  • Home > Merge Queries > Merge Queries as New
  • Primary table: Table
  • Secondary table: Duplicated Table
  • Join on File Name using Fuzzy Matching:
  • Check Use fuzzy matching
  • Click Fuzzy matching options:
  • Similarity threshold = tweak as needed
  • check ignore case
  • Max matches: blank

 

In the merged table, remove duplicates so each pair appears only once by using Table.Distinct

 

Create a group ID table using DAX to assign a group ID to every file participating in a fuzzy match. You can then join it back to your original file list, and if any file name is not present in the Grouping table, assign that group to 0.

If this post helps, please give us Kudos and consider marking it Accept as solution to assist other members in finding it more easily.

Regards,
Chaithra

Thank you so much @v-echaithra , that worked perfectly 🙂 I really appreciate your time 🙂

rajendraongole1
Super User
Super User

Hi @MichaelHutchens  - DAX does not support fuzzy matching natively, and more importantly, it cannot create calculated columns based on similarity clustering across rows.

suggest, You must use Power Query (M) or external processing (e.g., Python or R script in Power BI) for fuzzy clustering

some reference links:

Create a fuzzy match (Power Query) - Microsoft Support

Fuzzy match / merging in Power BI Desktop (October 2018)

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Thanks for that @rajendraongole1 , very useful. I've amended my original post to remove the DAX requirement. I'll take what solutions I can get.

Hi @MichaelHutchens ,

May I ask if you have gotten this issue resolved?

 

If it is solved, please mark the helpful reply or share your solution and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster.

 

Regards,
Chaithra.

Thanks for checking in @v-echaithra, no it's not resolved yet

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors