Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreThe FabCon + SQLCon recap series starts April 14th at 8am Pacific. If you’re tracking where AI is going inside Fabric, this first session is a can't miss. Register now
I have data from a retailer, and there are multiple items with the same or similar description that in some cases have the same item number, or in others have a different item number. Typically the duplicates are in the system due to carton qty changes or some other attribute change, however at the end of the day they are the same item.
How do I combine these items (in some cases I have as many as 24 of the same like item) so that when I create a table with sales info I do not get duplicate rows of the same items sales? I would like to condesne to one row per item not the multiple rows due to the duplicates.
I would apprecite any help the community can provide.
Depending on the nature of the data and volume I'd consider....
i) if you've got every possible combination already, and it's not too onerous a task, you could create a reference table of cleansed values to join to get your cleansed version.
| Item Number | Item Description | New Description |
| 12345 | Potato Peeler | Potato Peeler |
| 12345 | Potatoe Peeler | Potato Peeler |
| 54321 | Potato Peeler 6 pk | Potato Peeler |
ii) if it is a huge list of items or you'll be ingesting new records that could include new values, you might want to try and do some string comparison but this isn't without risk as would never be perfect. Have a look at things like Levenshtein Distance on https://en.wikipedia.org/wiki/String_metric and see if something like that would work. This is a good example https://community.powerbi.com/t5/Desktop/Levenshtein-String-Distance-Algorithm-In-DAX/m-p/959545.
iii) Failing that you need some data quality management upstream 🙂
Good luck!
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 53 | |
| 39 | |
| 37 | |
| 19 | |
| 18 |
| User | Count |
|---|---|
| 67 | |
| 66 | |
| 34 | |
| 32 | |
| 29 |