Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

July 7 - July 17 | Round 2 of the Power BI Dataviz World Championships. Don't miss your chance! Learn more

Reply
cathoms
Responsive Resident
Responsive Resident

Remove specific, partially duplicate rows

I am building a dataset/model from an Excel workbook prepared by others and split into multiple worksheets. The worksheets are split into lists of providers and their counts of (1) established visits and (2) new visits and the same but where providers have fewer than 26 (3) established visits or (4) new visits. Also, I am getting the provider ids from yet another excel workbook.

 

Anywho, in combining the queries so that I would have all Established visits in one query and all New visits in another query each with the provider id (i.e. NPI), the fact that there are two providers with the same name has caused a problem where I have duplicate values in the NPI column but the corresponding rows are not duplicates. See image:

cathoms_0-1621623166941.png

 

So, I tried setting up a custom column to create duplicates that I could then remove, like so:

each if ([NPI]="1760810293" and [taxonomy_key]="14" and [group_key]="2") then "1871035881"
else if ([NPI]="1871035881" and [taxonomy_key]="54" and [group_key]="95") then "1760810293"
else [NPI])

As you can see in the image above, that didn't work.

 

Essentially, I need to get rid of rows 1 and 3. Note that the above image is filtered to show the two NPIs of interest. There are a total of 697 rows, should be 695. Help! What should I do?

2 REPLIES 2
Fowmy
Super User
Super User

@cathoms 

I am not sure if the order and the pattern of the actual records are similar to what you have shown in the sample. I merged all the columns except NPI and applied "Remove Duplicates" which basically keeps the first occurrence and removes the rest. Check the following code after pasting it in a blank query on the Advanced Editor.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjCxNFbSUTI0ARJGQGxibAYkLZRidaKVTC0sDAnImYLkLE1BqgwtQErAklBDsUjGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t, D = _t, E = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"A", Int64.Type}, {"B", Int64.Type}, {"C", Int64.Type}, {"D", Int64.Type}, {"E", Int64.Type}}),
    #"Inserted Merged Column" = Table.AddColumn(#"Changed Type", "Merged", each Text.Combine({Text.From([B], "en-US"), Text.From([C], "en-US"), Text.From([D], "en-US"), Text.From([E], "en-US")}, "|"), type text),
    #"Removed Duplicates" = Table.Distinct(#"Inserted Merged Column", {"Merged"})
in
    #"Removed Duplicates"

Fowmy_0-1621627981027.png

 

 

 

Fowmy_1-1621627993180.png

 

 

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

cathoms
Responsive Resident
Responsive Resident

So basically you are creating a new column based on all the other columns, then removing duplicates based on that new column. What you are showing gets rid of the wrong rows. However, if remove duplicates keeps the first occurence, would it work if I sort descending?

 

Also, could I just remove rows based on all the other columns without the need to create a new merged column?

Helpful resources

Announcements
FabCon and SQLCon Barcelona 2026

FabCon & SQLCon – Barcelona 2026

Join us in Barcelona for FabCon and SQLCon, the Fabric, Power BI, SQL, and AI community event. Save €200 with code FABCMTY200.

60 days of Data Days Carousel

Data Days 2026

Join Fabric Data Days 2026: 60 days of free live/on-demand sessions, challenges, study groups, and certification opportunities.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.