Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi All, I have question about partial duplicates. I would like to find out about best practices and/or solution to get OrderTable & CustomerInfo into DesiredTable.
I have used various join to merge the table but I can't fix the CustomerID which are duplicate.
I would like to have a distinct and unique ID for CustomerID. When I used ReplaceValue (111), it replaced all. Appreciate any advice. Thanks. Paul
OrderTable
CustomerID CustomerName Orders
111 John PC
111 Paul Labtop
111 Mary Handphone
112 Charlie Switch
CustomerInfo
CustomerName Address
John Boston
Paul New York
Mary Texas
Charie California
DesiredTable
CustomerID CustomerName Orders Address
100 John PC Boston
101 Paul Labtop New York
111 Mary Handphone Texas
112 Charlie Switch California
Hi @geminitp ,
I'm not sure whether you have oversimplified your data here, but you are going to find it incredibly challenging to create the output table you require with the data structure you have provided.
The first question that really needs addressing is: why are Customer ID's not unique?
The second question is: how unique are customer names? If they are actually John, Jack etc. you've got little chance. If they are ABC Widget Co., ACME Dynamite Inc. etc. (i.e. entirely unique and will not repeat) then you could probably join on the [CustomerName] fields in each table without using the [CustomerID] field at all.
You *may* be able to get a unique [CustomerID] column by merging the [CustomerID] and [CustomerName] columns in Power Query. This would give you a new ID column that would show values like 111John, 111Paul, 111Mary, 112Charlie etc. although, as above, these types of names may repeat frequently like your [CustomerID] and therefore this may still not create a truly unique ID field. Even once you've done this, there's no way of doing the same in the CustomerInfo table to create a common field to join on.
Pete
Proud to be a Datanaut!
@geminitp , to fix this we need something unique. Like in sample data customer name is unique. You can add an index column
https://stackoverflow.com/questions/45715963/creating-an-index-column-for-power-bi
Starting with 100. in first table
and then you can merge two tables on customer name
https://radacad.com/append-vs-merge-in-power-bi-and-power-query
Thanks for advice @amitchandak , @BA_Pete . I have simplified the table to convey a problem involving partial table, and duplication as error entry. The CustomerID is used as record identifer for an evolving Table and some CustomerID have errors , my thought is to simplify flow and correct CustomerID to be use as identifier. This CustomerID is just like an Social Security No. which is supposed to be distinct and unique.
1) The new index column may work us single out unique entry but I probably have to figure out a way to ReplaceValue for CustomerID errors.
2) Concatenate CustomerID & Name probably wont work if I need to index to other table using CustomerID.
OK, so it looks like you really need to get to the root cause of why some of your [CustomerID] are showing as errors. Are the errors generated by Power Query, or are the errors in the data source and are being pulled into Power Query?
If you're not sure, select one of the error cells in Power Query and share a screenshot of what the errors look like. Please remove any sensitive information from your screenshot.
Next, you need to look at whether you can bring [CustomerID] into your CustomerDetails table. If [CustomerID] exists in the system I would assume that it would feature in the CustomerDetails table. From there, a simple merge will get you what you need.
Pete
Proud to be a Datanaut!