Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
I am pulling a customer table which is basically a sales table with all columns, except customer email, removed. I then 'remove duplicates' on the customer email column in Editor, essentially giving me a distinct list of all customer emails that have made a purchase with us. However, when i try to create a relationship between this column and 'customer email' in the sales table, i get an error stating that one of the 2 columns being related must be without duplicates - as i already do a distinct on the customer table, this should not happen.
Does anyone know if using a 'remove duplicates' is the reason this issue occurs and if there is a work around ?
Thanks!
Hi @S_M
Could you please explain the scenario with example :-
Like
Dataset 1
id xyz abc
1 a b
Dataset 2
id xyz abc
1 a b
Relationship between Dataset 1 and Dataset 2
and in which step you are facing challenge.
Hi @kaushikd,
To break it down : I have a sales table (Table1):
order_number customer_email
1 abc@xyz.com
2 def@xyz.com
3 ghi@xyz.com
4 def@xyz.com
I use Table1 to create a new customers table (Table2) which only has the row 'customer_email', removing duplicates, so essentially it looks like this :
Table2:
customer_email
abc@xyz.com
def@xyz.com
ghi@xyz.com
I am now trying to relate Table2 to Table1 using 'customer_email' as primary key, however it doesnt allow me to create a relationship by saying that both tables do not contain distinct values. So i am stumped at why this would happen and how to go about it
Hi I have taken the same data as you have provided:-
and tried the following followed the steps:-
1. Imported the Table1
order_number customer_email
1 abc@xyz.com
2 def@xyz.com
3 ghi@xyz.com
4 def@xyz.com
from excel file
2. Go to Edit Query--> Right Click on Table1-->Select Duplicate-->Renamed the new Table as Table2
3. For Table2-->Right Click on order_number Column and Remove.-->Right Click on customer_email Column and Remove Duplicates.
4. Close and Apply.
5.Go to Relationship
you will find power bi will automatically do a join for both the table or else please do it by your own.
If this help you out please accept this as a solution thanks
@kaushikd, unfortunately this is similar to what ive done and i face the problem above, which is that i am unable to entirely remove duplicates.
@ovetteabejuela, good idea! I actually did that and found one sneaky duplicate that is somehow not being detected by PowerQuery when removing duplicates. On excel, these duplicates get highlighted, however i ran some trim / length functions to understand why it is not being detected as a duplicate, but they are exactly alike. Any idea how i can remove these on editor ?
Hi @S_M,
Maybe you could try to remove dupplicate in Query editor by using Power Query. Here is a similar thread for you reference.
https://community.powerbi.com/t5/Desktop/Remove-or-Hide-Duplicate-Values/td-p/10409
Table.Distinct
https://msdn.microsoft.com/en-us/library/mt260775.aspx
Regards,
Charlie Liao
I would also try to do a little validation.
You can export the table to excel and do a Remove Duplicates there and see if it did find duplicates.
User | Count |
---|---|
123 | |
77 | |
62 | |
50 | |
48 |
User | Count |
---|---|
175 | |
125 | |
61 | |
60 | |
58 |