Solved: Remove duplicates selectively based on priority or...

basic · ‎06-11-2019

Finally gave up looking for examples to learn from, and decided to post my first question to the community.

Hello, i have a table similar to this:

Row# Col A Col B

1 A1 B1

2 A1 B2

3 A1 B3

4 A2 B2

5 A2 B3

6 A3 B1

7 A3 B3

I would like to remove duplicates on Col A such that -- whenever Col B = B1, I want to keep that row. So between rows 1, 2 and 3, I would like to keep row 1. Between rows 6 and 7, I will keep row 6.

When Col A has duplicated and Col B doesn't have any B1 (a specific value), I don't care which row I keep as long as I keep only one. So between rows 4 and 5, I could keep either, doesn't matter.

My resulting table should look like this:

Row# Col A Col B

1 A1 B1

4 A2 B2

6 A3 B1

OR

Row# Col A Col B

1 A1 B1

5 A2 B3

6 A3 B1

(as i said above, rows 4 and 5 mean the same to me)

Can anyone help out please?

basic · ‎06-13-2019

Thanks Zubair. Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time. That makes the remove duplicates function not suitable for my requirement.

I got help from a colleague and ended up learning a bit of M to get the job done. My code looks something like this:

let
     Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
     #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
     #"A1 list" = Table.Column(#"B1 rows", "Col A"),
     #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
     #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
     #"Appended Query"

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer. But it gets the job done, and that's what i care about.

View solution in original post

Zubair_Muhammad · ‎06-12-2019

@basic

If your data is sorted (as in your sample data) i.e. B1 is the first row for each unique ColA item

you simple have to select Col A>>right click>>remove duplicates

Regards
Zubair

Please try my custom visuals

basic · ‎06-13-2019

Thanks Zubair. Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time. That makes the remove duplicates function not suitable for my requirement.

I got help from a colleague and ended up learning a bit of M to get the job done. My code looks something like this:

let
     Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
     #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
     #"A1 list" = Table.Column(#"B1 rows", "Col A"),
     #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
     #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
     #"Appended Query"

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer. But it gets the job done, and that's what i care about.

basic · ‎06-13-2019

Thanks Zubair. Sorry I didn't specify that B1 in my data could appear on any row -- not necessarily on the first row all the time. That makes the remove duplicates function is not suitable for my requirement.

I got help from a colleague and ended up learning a bit of M to get the job done. My code looks something like this:

let
   Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
   #"B1 rows" = Table.Distinct(Table.SelectRows(Source, each ([Col B] = "B1")), {"Col A"}),
   #"A1 list" = Table.Column(#"B1 rows", "Col A"),
   #"Non B1 Rows" = Table.Distinct(Table.SelectRows(#"Table 1", each not List.Contains(#"A1 list", [DNS])), {"Col A"}),
   #"Appended Query" = Table.Combine({#"B1 rows", #"Non B1 Rows"})
in
   #"Appended Query"

The exception join (not List.Contains) must not be efficient because my query runs 10 minutes or longer. But it gets the job done, and that's what i care about.