Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
cathoms
Responsive Resident
Responsive Resident

Remove specific, partially duplicate rows

I am building a dataset/model from an Excel workbook prepared by others and split into multiple worksheets. The worksheets are split into lists of providers and their counts of (1) established visits and (2) new visits and the same but where providers have fewer than 26 (3) established visits or (4) new visits. Also, I am getting the provider ids from yet another excel workbook.

 

Anywho, in combining the queries so that I would have all Established visits in one query and all New visits in another query each with the provider id (i.e. NPI), the fact that there are two providers with the same name has caused a problem where I have duplicate values in the NPI column but the corresponding rows are not duplicates. See image:

cathoms_0-1621623166941.png

 

So, I tried setting up a custom column to create duplicates that I could then remove, like so:

each if ([NPI]="1760810293" and [taxonomy_key]="14" and [group_key]="2") then "1871035881"
else if ([NPI]="1871035881" and [taxonomy_key]="54" and [group_key]="95") then "1760810293"
else [NPI])

As you can see in the image above, that didn't work.

 

Essentially, I need to get rid of rows 1 and 3. Note that the above image is filtered to show the two NPIs of interest. There are a total of 697 rows, should be 695. Help! What should I do?

2 REPLIES 2
Fowmy
Super User
Super User

@cathoms 

I am not sure if the order and the pattern of the actual records are similar to what you have shown in the sample. I merged all the columns except NPI and applied "Remove Duplicates" which basically keeps the first occurrence and removes the rest. Check the following code after pasting it in a blank query on the Advanced Editor.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjCxNFbSUTI0ARJGQGxibAYkLZRidaKVTC0sDAnImYLkLE1BqgwtQErAklBDsUjGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t, D = _t, E = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"A", Int64.Type}, {"B", Int64.Type}, {"C", Int64.Type}, {"D", Int64.Type}, {"E", Int64.Type}}),
    #"Inserted Merged Column" = Table.AddColumn(#"Changed Type", "Merged", each Text.Combine({Text.From([B], "en-US"), Text.From([C], "en-US"), Text.From([D], "en-US"), Text.From([E], "en-US")}, "|"), type text),
    #"Removed Duplicates" = Table.Distinct(#"Inserted Merged Column", {"Merged"})
in
    #"Removed Duplicates"

Fowmy_0-1621627981027.png

 

 

 

Fowmy_1-1621627993180.png

 

 

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

cathoms
Responsive Resident
Responsive Resident

So basically you are creating a new column based on all the other columns, then removing duplicates based on that new column. What you are showing gets rid of the wrong rows. However, if remove duplicates keeps the first occurence, would it work if I sort descending?

 

Also, could I just remove rows based on all the other columns without the need to create a new merged column?

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors