Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi all,
I'm having an issue with primary key duplication detection in data view but not power Query. The source of the data is a lookup table that is stored in an on-premises database, which doesn't allow for duplication.
The source data isn't the cleanest but I've applied the below transformations, as suggested by this thread, in the hopes that it would remove all potential duplicates (even though I wouldn't expect that there are any) but I'm still getting a many to many relationship when attempting to link the lookup table to the data table.
contact_Table = Administrator_Schema{[Name="contact",Kind="Table"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(contact_Table,{"contact_code", "surname", "name", "company_name", "bus_phone", "email_address", "street", "city", "state", "pcode", "DOB", "Account_Class", "Ckc_Id", "Trading_Entity_Name"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"contact_code", "Contact_Code"}, {"surname", "LastName"}, {"name", "FirstName"}, {"company_name", "Company_Name"}, {"bus_phone", "Bus_Phone"}, {"email_address", "Email"}, {"street", "Street"}, {"city", "City"}, {"state", "State"}, {"pcode", "Post Code"}, {"Ckc_Id", "CKC_ID"}}),
#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each [Contact_Code] <> null and [Contact_Code] <> ""),
#"Uppercased Text" = Table.TransformColumns(#"Filtered Rows",{{"Contact_Code", Text.Upper, type text}}),
#"Removed Duplicates" = Table.Distinct(#"Uppercased Text", {"Contact_Code"})
in
#"Removed Duplicates"
I've tried analyzing the data in Power BI to see what it's picking up as duplicates by using a distinct count measure on the contact code which lead me to finding that leading zeros and trailing spaces were being dropped (causing the distinct count to produce an inaccurate result).
I then looked into how I could resolve this issue and the best answer I could find was to define both datatypes as type text so that it maintained the leading zeros/trailing spaces. Unfortunately, this still hasn't fixed my issue. (I'd like the source but haven't been able to find the article suggesting it again).
I'm reluctant to try the solutions suggested in this thread as I'm concerned that I'll have the same issue with losing any results with leading zeros/trailing spaces.
Something else to note is that I've already tried the 'Remove Duplicates' function in the Power Query editor - something you'll see in the advance query I posted above.
The last thing I thought to try was to replace leading spaces with a unique symbol but I'm not sure how I can isolate those leading zeros/trailing spaces or what symbol I can use that might not have already been used.
*Edit - I tried this and it didn't work either
Something worth mentioning is that merging the lookup table with the Data table produces usable material but this isn't best practice from what I understand which is why I've refrained from doing it.
Thanks in advance for your help here - I've been doing my best to troubleshoot this but keep hitting walls with every solution.
"when I run a dax query to locate distinct values, one of the two is removed" - can you show me this please?
--
I would need to see the other tables and relationships (and error messages) to get any further with that example 1
--
For example 2, I'm not really understanding the issue. It's certainly possible for 2 similar pieces of data to be treated as different if, for example, there are hidden special characters. However, this is quite unusual and can be sorted out with a Trim and Clean in Power Query.
Maybe you can post your pbix, it would save time.
Can you give us a data sample please and indicate what your desired result is?
As far as I know, for text fields, leading spaces are preserved but trailing spaces will be removed automatically.
Hi mate,
Thanks for coming back to me - apologies for the delayed response.
I've listed an example below:
Contact Code A: 0544
Contact Code B: 544
Image from System:
The above are detected as unique values in Power Query but, when I run a dax query to locate distinct values, one of the two is removed. Second to that, PBI is picking them up as duplicates when I attempt to create the relationship with another table as its defaulting to Many-to-Many and won't allow a One-to-Many.
Example 2:
Contact Code A: "DUCKWORTH"
Contact Code B: "DUCKWORTH "
Image from system:
Also, I've tested creating a calculated table summarizing several columns based on account code which has allowed me to create a One-to-Many relationship. This still isn't my ideal solution as this means that several of those duplicate values have been removed when they are, in fact, unique in the database.
I understand that they are duplicates but its possible for each to have unique information stored against it.
Thanks in advance for your help.
@HotChilli - just realized I forgot to tag you so I'm not sure you saw my response.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
100 | |
51 | |
19 | |
12 | |
11 |