Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
c_cook28954
Advocate I
Advocate I

Primary Key Duplicate Detection in Data View, not Power Query

Hi all, 

 

I'm having an issue with primary key duplication detection in data view but not power Query. The source of the data is a lookup table that is stored in an on-premises database, which doesn't allow for duplication. 

 

The source data isn't the cleanest but I've applied the below transformations, as suggested by this thread, in the hopes that it would remove all potential duplicates (even though I wouldn't expect that there are any) but I'm still getting a many to many relationship when attempting to link the lookup table to the data table. 

 

    contact_Table = Administrator_Schema{[Name="contact",Kind="Table"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(contact_Table,{"contact_code", "surname", "name", "company_name", "bus_phone", "email_address", "street", "city", "state", "pcode", "DOB", "Account_Class", "Ckc_Id", "Trading_Entity_Name"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"contact_code", "Contact_Code"}, {"surname", "LastName"}, {"name", "FirstName"}, {"company_name", "Company_Name"}, {"bus_phone", "Bus_Phone"}, {"email_address", "Email"}, {"street", "Street"}, {"city", "City"}, {"state", "State"}, {"pcode", "Post Code"}, {"Ckc_Id", "CKC_ID"}}),
    #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each [Contact_Code] <> null and [Contact_Code] <> ""),
    #"Uppercased Text" = Table.TransformColumns(#"Filtered Rows",{{"Contact_Code", Text.Upper, type text}}),
    #"Removed Duplicates" = Table.Distinct(#"Uppercased Text", {"Contact_Code"})
in
    #"Removed Duplicates"

 

I've tried analyzing the data in Power BI to see what it's picking up as duplicates by using a distinct count measure on the contact code which lead me to finding that leading zeros and trailing spaces were being dropped (causing the distinct count to produce an inaccurate result). 

 

I then looked into how I could resolve this issue and the best answer I could find was to define both datatypes as type text so that it maintained the leading zeros/trailing spaces. Unfortunately, this still hasn't fixed my issue. (I'd like the source but haven't been able to find the article suggesting it again). 

 

I'm reluctant to try the solutions suggested in this thread as I'm concerned that I'll have the same issue with losing any results with leading zeros/trailing spaces. 

 

Something else to note is that I've already tried the 'Remove Duplicates' function in the Power Query editor - something you'll see in the advance query I posted above. 

 

The last thing I thought to try was to replace leading spaces with a unique symbol but I'm not sure how I can isolate those leading zeros/trailing spaces or what symbol I can use that might not have already been used. 

*Edit - I tried this and it didn't work either

 

Something worth mentioning is that merging the lookup table with the Data table produces usable material but this isn't best practice from what I understand which is why I've refrained from doing it. 

 

Thanks in advance for your help here - I've been doing my best to troubleshoot this but keep hitting walls with every solution. 

4 REPLIES 4
HotChilli
Super User
Super User

"when I run a dax query to locate distinct values, one of the two is removed" - can you show me this please?

--

I would need to see the other tables and relationships (and error messages) to get any further with that example 1

--

For example 2, I'm not really understanding the issue.  It's certainly possible for 2 similar pieces of data to be treated as different if, for example, there are hidden special characters.  However, this is quite unusual and can be sorted out with a Trim and Clean in Power Query.

 

 

Maybe you can post your pbix, it would save time.

HotChilli
Super User
Super User

Can you give us a data sample please and indicate what your desired result is?

As far as I know, for text fields, leading spaces are preserved but trailing spaces will be removed automatically.

Hi mate, 

 

Thanks for coming back to me - apologies for the delayed response. 

 

I've listed an example below: 

Contact Code A: 0544

Contact Code B: 544

 

Image from System: 

c_cook28954_0-1686011093138.png

 

The above are detected as unique values in Power Query but, when I run a dax query to locate distinct values, one of the two is removed. Second to that, PBI is picking them up as duplicates when I attempt to create the relationship with another table as its defaulting to Many-to-Many and won't allow a One-to-Many. 

 

Example 2: 

Contact Code A: "DUCKWORTH"

Contact Code B: "DUCKWORTH "

 

Image from system: 

c_cook28954_1-1686011392494.png

 

Also, I've tested creating a calculated table summarizing several columns based on account code which has allowed me to create a One-to-Many relationship. This still isn't my ideal solution as this means that several of those duplicate values have been removed when they are, in fact, unique in the database.

 

I understand that they are duplicates but its possible for each to have unique information stored against it.

 

Thanks in advance for your help. 

@HotChilli - just realized I forgot to tag you so I'm not sure you saw my response. 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors