Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
Hi I have been tasked to look for duplication on our company database and come up with dashboard that will present it so we can use it for tracking and correction.
I am currently looking at companies in our database and finding what companies may have been duplicated, i have come up with an intial solution of taking a partial part of the name of the company and concenating the postcode together to form a match string and using the EARLIER and FILTER functions finding the records that return multiple rows that share the same match string. On the whole this appears to work however the results returned contain both postives and false postives too.
I want to therefore make this more intutiative in that the duplicates are scored based on their similarties so the more they match the higher they score. Depnding how high they score will determine the type of correction required.
The sample data would look something like this
| Company ID | Company Name | Postcode | Creation Date |
| 1 | ABC LTD | AB11 7AH | 01/01/2023 |
| 2 | ABC LTD | AB11 7AH | 02/01/2023 |
| 3 | John AND Smith LTD | KT6 5PT | 01/01/2023 |
| 5 | John & Smith LTD | KT6 5PT | 02/01/2023 |
| 6 | 123 LTD | BB1 8AL | 01/01/2023 |
| 7 | 123 LTD | M26 7HT | 02/01/2023 |
| 8 | XYZ and Sons | BB1 8AL | 01/01/2023 |
| 9 | XYZ & Sons | M26 7HT | 02/01/2023 |
| 10 | Joe Blogs Limited | M1 8AB | 01/01/2023 |
I want these to to have different tiers of scoring so something like this
100% = Company Name and Postcode Match exactly
75% = Postcodes match but company names somewhat match
50% = Company Name matche eaxctly but Postcodes are different
25% = Company names somewhat match but poscodes are differrent
0% = Neither matches and the record is unique
I also want to state which record is the orginal and which is the duplicate. Ive done this currently using rankx function finding records which are share the same match string orderd by the creation date so the ones created first are marked as orginal and those after are marked as duplicates.This allows us to track how many duplicates are being created during a time period.Im not sure how this could be factored into the solution
The end result should be something like this
| Company ID | Company Name | Postcode | Creation Date | Match Percentage | Record Type |
| 1 | ABC LTD | AB11 7AH | 01/01/2023 | 100 | Orginal |
| 2 | ABC LTD | AB11 7AH | 02/01/2023 | 100 | Duplicate |
| 3 | John AND Smith LTD | KT6 5PT | 01/01/2023 | 75 | Orginal |
| 5 | John & Smith LTD | KT6 5PT | 02/01/2023 | 75 | Duplicate |
| 6 | 123 LTD | BB1 8AL | 01/01/2023 | 50 | Orginal |
| 7 | 123 LTD | M26 7HT | 02/01/2023 | 50 | Duplicate |
| 8 | XYZ and Sons | BB1 8AL | 01/01/2023 | 25 | Orginal |
| 9 | XYZ & Sons | M26 7HT | 02/01/2023 | 25 | Duplicate |
| 10 | Joe Blogs Limited | L1 8AB | 01/01/2023 | 0 | Orignal |
Im hoping to create visuals and filters based on the scores so you can select records based on how well they match
Apologies but im not very familar with string matching using various string matching techniques so you might need to simplify this for me
Check out the November 2025 Power BI update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!