The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have data from a retailer, and there are multiple items with the same or similar description that in some cases have the same item number, or in others have a different item number. Typically the duplicates are in the system due to carton qty changes or some other attribute change, however at the end of the day they are the same item.
How do I combine these items (in some cases I have as many as 24 of the same like item) so that when I create a table with sales info I do not get duplicate rows of the same items sales? I would like to condesne to one row per item not the multiple rows due to the duplicates.
I would apprecite any help the community can provide.
Depending on the nature of the data and volume I'd consider....
i) if you've got every possible combination already, and it's not too onerous a task, you could create a reference table of cleansed values to join to get your cleansed version.
Item Number | Item Description | New Description |
12345 | Potato Peeler | Potato Peeler |
12345 | Potatoe Peeler | Potato Peeler |
54321 | Potato Peeler 6 pk | Potato Peeler |
ii) if it is a huge list of items or you'll be ingesting new records that could include new values, you might want to try and do some string comparison but this isn't without risk as would never be perfect. Have a look at things like Levenshtein Distance on https://en.wikipedia.org/wiki/String_metric and see if something like that would work. This is a good example https://community.powerbi.com/t5/Desktop/Levenshtein-String-Distance-Algorithm-In-DAX/m-p/959545.
iii) Failing that you need some data quality management upstream 🙂
Good luck!
User | Count |
---|---|
75 | |
70 | |
39 | |
30 | |
28 |
User | Count |
---|---|
104 | |
95 | |
51 | |
48 | |
46 |