The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi everyone, I'm hoping someone might be able to help. I'm looking to create a column that shows groupings of similarly-named strings.
The column I need should show unique numbers. These groupings represent 'buckets' of similarly-named strings from a 'File Name' column.
A grouping should be a collection of similar file names - similar to a fuzzy matching threshold of 0.9.
If there is a string that doesn't have any other matches, it should have a Grouping number of 0.
Here's a table with some sample data, and the 'Grouping' column I need:
Note: The groupings I've suggested above might not exactly align to a similarity threshold of 0.9, I've used these groupings just as examples.
Any help would really be appreciated! 🙂
Solved! Go to Solution.
Hi @MichaelHutchens ,
Create a copy of the table by duplicating the table in Power query.
Perform Fuzzy Merge - Self-Join
In the merged table, remove duplicates so each pair appears only once by using Table.Distinct
Create a group ID table using DAX to assign a group ID to every file participating in a fuzzy match. You can then join it back to your original file list, and if any file name is not present in the Grouping table, assign that group to 0.
If this post helps, please give us Kudos and consider marking it Accept as solution to assist other members in finding it more easily.
Regards,
Chaithra
Hi @MichaelHutchens ,
Create a copy of the table by duplicating the table in Power query.
Perform Fuzzy Merge - Self-Join
In the merged table, remove duplicates so each pair appears only once by using Table.Distinct
Create a group ID table using DAX to assign a group ID to every file participating in a fuzzy match. You can then join it back to your original file list, and if any file name is not present in the Grouping table, assign that group to 0.
If this post helps, please give us Kudos and consider marking it Accept as solution to assist other members in finding it more easily.
Regards,
Chaithra
Thank you so much @v-echaithra , that worked perfectly 🙂 I really appreciate your time 🙂
Hi @MichaelHutchens - DAX does not support fuzzy matching natively, and more importantly, it cannot create calculated columns based on similarity clustering across rows.
suggest, You must use Power Query (M) or external processing (e.g., Python or R script in Power BI) for fuzzy clustering
some reference links:
Create a fuzzy match (Power Query) - Microsoft Support
Fuzzy match / merging in Power BI Desktop (October 2018)
Proud to be a Super User! | |
Thanks for that @rajendraongole1 , very useful. I've amended my original post to remove the DAX requirement. I'll take what solutions I can get.
Hi @MichaelHutchens ,
May I ask if you have gotten this issue resolved?
If it is solved, please mark the helpful reply or share your solution and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster.
Regards,
Chaithra.