Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
basic
Frequent Visitor

Remove duplicates selectively based on priority order

Finally gave up looking for examples to learn from, and decided to post my first question to the community.

 

Hello, i have a table similar to this:

 

Row#     Col A     Col B

1            A1          B1

2            A1          B2

3            A1          B3

4            A2          B2

5            A2          B3

6            A3          B1

7            A3          B3

 

I would like to remove duplicates on Col A such that -- whenever Col B = B1, I want to keep that row.  So between rows 1, 2 and 3, I would like to keep row 1.  Between rows 6 and 7, I will keep row 6.

 

When Col A has duplicated and Col B doesn't have any B1 (a specific value), I don't care which row I keep as long as I keep only one.  So between rows 4 and 5, I could keep either, doesn't matter.  

 

My resulting table should look like this:

 

Row#     Col A     Col B

1            A1          B1

4            A2          B2

6            A3          B1

 

OR

 

Row#     Col A     Col B

1            A1          B1

5            A2          B3

6            A3          B1

 

(as i said above, rows 4 and 5 mean the same to me)

 

Can anyone help out please?

1 ACCEPTED SOLUTION

Thanks Zubair.  Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time.  That makes the remove duplicates function not suitable for my requirement.  

I got help from a colleague and ended up learning a bit of M to get the job done.  My code looks something like this:

 

let
     Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
     #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
     #"A1 list" = Table.Column(#"B1 rows", "Col A"),
     #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
     #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
     #"Appended Query"

 

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer.  But it gets the job done, and that's what i care about.

View solution in original post

3 REPLIES 3
Zubair_Muhammad
Community Champion
Community Champion

@basic 

 

If your data is sorted  (as in your sample data) i.e. B1 is the first row for each unique ColA item

 

you simple have to select Col A>>right click>>remove duplicates

 

removeduplicates.png

Thanks Zubair.  Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time.  That makes the remove duplicates function not suitable for my requirement.  

I got help from a colleague and ended up learning a bit of M to get the job done.  My code looks something like this:

 

let
     Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
     #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
     #"A1 list" = Table.Column(#"B1 rows", "Col A"),
     #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
     #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
     #"Appended Query"

 

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer.  But it gets the job done, and that's what i care about.

Thanks Zubair.  Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time.  That makes the remove duplicates function is not suitable for my requirement.  

I got help from a colleague and ended up learning a bit of M to get the job done.  My code looks something like this:

 

let
   Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
   #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
   #"A1 list" = Table.Column(#"B1 rows", "Col A"),
   #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
   #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
   #"Appended Query"

 

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer.  But it gets the job done, and that's what i care about.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.