Combining like items

Roger_McCallen · ‎02-02-2023

I have data from a retailer, and there are multiple items with the same or similar description that in some cases have the same item number, or in others have a different item number. Typically the duplicates are in the system due to carton qty changes or some other attribute change, however at the end of the day they are the same item.

How do I combine these items (in some cases I have as many as 24 of the same like item) so that when I create a table with sales info I do not get duplicate rows of the same items sales? I would like to condesne to one row per item not the multiple rows due to the duplicates.

I would apprecite any help the community can provide.

cpearson · ‎02-02-2023

Depending on the nature of the data and volume I'd consider....

i) if you've got every possible combination already, and it's not too onerous a task, you could create a reference table of cleansed values to join to get your cleansed version.

Item Number	Item Description	New Description
12345	Potato Peeler	Potato Peeler
12345	Potatoe Peeler	Potato Peeler
54321	Potato Peeler 6 pk	Potato Peeler

ii) if it is a huge list of items or you'll be ingesting new records that could include new values, you might want to try and do some string comparison but this isn't without risk as would never be perfect. Have a look at things like Levenshtein Distance on https://en.wikipedia.org/wiki/String_metric and see if something like that would work. This is a good example https://community.powerbi.com/t5/Desktop/Levenshtein-String-Distance-Algorithm-In-DAX/m-p/959545.

iii) Failing that you need some data quality management upstream 🙂

Good luck!