Solved: Re: I need to filter a list, remove duplicates, th...

Anonymous · ‎04-26-2022

I am working on a dataset that has a column for enrollment (either Y or N), and I would like to remove the duplicate customers from just the enrollment data (enrollment = Y). Is there a way that I can remove duplicates from just the enrollment people without splitting up the list?

Vijay_A_Verma · ‎04-26-2022

Filter your column on N once and Y once. On filtered Y table, remove duplicates. Append the result with filtered N table.

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUYpUitWJVnKCs5yBLD8wC1UWIuYCF3OHsxBinnB1XiiyEDEfHHZAWP5wlhMOF0BYwRBWLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Enrollment = _t]),
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Enrollment] = "N")),
    Custom1 = Table.SelectRows(#"Added Index", each ([Enrollment] = "Y")),
    #"Removed Duplicates" = Table.Distinct(Custom1, {"Name"}),
    #"Appended Query" = Table.Combine({#"Filtered Rows", #"Removed Duplicates"}),
    #"Sorted Rows" = Table.Sort(#"Appended Query",{{"Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index"})
in
    #"Removed Columns"

View solution in original post

Vijay_A_Verma · ‎04-26-2022

Filter your column on N once and Y once. On filtered Y table, remove duplicates. Append the result with filtered N table.

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUYpUitWJVnKCs5yBLD8wC1UWIuYCF3OHsxBinnB1XiiyEDEfHHZAWP5wlhMOF0BYwRBWLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Enrollment = _t]),
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Filtered Rows" = Table.SelectRows(#"Added Index", each ([Enrollment] = "N")),
    Custom1 = Table.SelectRows(#"Added Index", each ([Enrollment] = "Y")),
    #"Removed Duplicates" = Table.Distinct(Custom1, {"Name"}),
    #"Appended Query" = Table.Combine({#"Filtered Rows", #"Removed Duplicates"}),
    #"Sorted Rows" = Table.Sort(#"Appended Query",{{"Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index"})
in
    #"Removed Columns"

Anonymous · ‎04-26-2022

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjZU0lGqBGJDA6VYHSDfCMo3gvJh8sZo8iZQvjGQnQfEpkB+LAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Colonna1 = _t, Colonna2 = _t, Colonna3 = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"Colonna1", type text}, {"Colonna2", type text}}),
    #"Rimossi duplicati" = Table.Distinct(#"Modificato tipo",{"Colonna1","Colonna2"})
in
    #"Rimossi duplicati"

table.distinct or list.distict