Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now

Reply
av9
Helper III
Helper III

Remove duplicate rows in data based on column values

Hi

I am trying to clean up the duplicate rows in my data in Power Query. 

This is a sample of the data imported.

IDCustomer IDNameSubscriptionSubscriber DateSubscriber Status 
1CUST01Ferris LoMonthly Newsletter1/8/2020Activekeep
2CUST01Ferris LoMonthly Newsletter1/5/2019Inactiveremove
3CUST02John SmithMonthly Newsletter1/5/2019Activekeep
4CUST03Jim JamesMonthly Newsletter1/1/2018Activekeep
5CUST04Pat HamMonthly Newsletter1/6/2020Inactivekeep

6

CUST04Pat HamMonthly Newsletter1/5/2019Inactiveremove

 

and I would like it to end up like this:

based on criteria;

IF Customer ID is duplicated, and Subscriber status = Active, remove rows where Subscriber status = Inactive

IF Customer ID is duplicated, and Subscriber status = Inactive, remove oldest Subscriber Date rows (i.e. keep max date)

IDCustomer IDNameSubscriptionSubscriber DateSubscriber Status
1CUST01Ferris LoMonthly Newsletter1/8/2020Active
3CUST02John SmithMonthly Newsletter1/5/2019Active
4CUST03Jim JamesMonthly Newsletter1/1/2018Active
5CUST04Pat HamMonthly Newsletter1/6/2020Inactive

 

1 ACCEPTED SOLUTION
Jakinta
Solution Sage
Solution Sage

You can try this in new query and adjust accordingly.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXIODQ4xADHcUouKMosVfPKBbN/8vJKMnEoFv9Ty4pzUkpLUIqCgob6FvpGBkQGQ6ZhcklmWqhSrE61kRKIZpkAzDC2BTM+8RIQpxjBTQMZ55WfkKQTnZpZkEGEMklNMYIaATPPKzFXwSsxNLcZthiHIDAtUM0xhZoAMC0gsUfBIzMVtghksQFA8Y0aSGVgCJBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Customer ID" = _t, Name = _t, Subscription = _t, #"Subscriber Date" = _t, #"Subscriber Status" = _t]),
    Grouped = Table.Group(Source, {"Customer ID"}, {{"GR", each if List.Contains(List.Distinct(_[Subscriber Status]),"Active")
then Table.SelectRows(_, each ([Subscriber Status] = "Active"))
else Table.FirstN(Table.Sort(_,{{"Subscriber Date", Order.Descending}}),1)}}),
    Removed = Table.RemoveColumns(Grouped,{"Customer ID"}),
    FINAL = Table.ExpandTableColumn(Removed, "GR", {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"}, {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"})
in
    FINAL

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Seems that you could Group on Customer ID using the GUI function, choosing "All Rows" as the aggregation.  Let's say you named the Grouped table column "Details".  Then you can filter for each Max Date per Customer ID:

Filtered = Table.SelectRows(PreviousStepName, each Table.Max([Details], "Subscriber Date"))

Then keep only the "Details" column, and expand it. This also eliminates the need to care what the active/inactive status, since you are returning each latest status.

 

--Nate

Jakinta
Solution Sage
Solution Sage

You can try this in new query and adjust accordingly.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXIODQ4xADHcUouKMosVfPKBbN/8vJKMnEoFv9Ty4pzUkpLUIqCgob6FvpGBkQGQ6ZhcklmWqhSrE61kRKIZpkAzDC2BTM+8RIQpxjBTQMZ55WfkKQTnZpZkEGEMklNMYIaATPPKzFXwSsxNLcZthiHIDAtUM0xhZoAMC0gsUfBIzMVtghksQFA8Y0aSGVgCJBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Customer ID" = _t, Name = _t, Subscription = _t, #"Subscriber Date" = _t, #"Subscriber Status" = _t]),
    Grouped = Table.Group(Source, {"Customer ID"}, {{"GR", each if List.Contains(List.Distinct(_[Subscriber Status]),"Active")
then Table.SelectRows(_, each ([Subscriber Status] = "Active"))
else Table.FirstN(Table.Sort(_,{{"Subscriber Date", Order.Descending}}),1)}}),
    Removed = Table.RemoveColumns(Grouped,{"Customer ID"}),
    FINAL = Table.ExpandTableColumn(Removed, "GR", {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"}, {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"})
in
    FINAL

Helpful resources

Announcements
April Power BI Update Carousel

Power BI Monthly Update - April 2026

Check out the April 2026 Power BI update to learn about new features.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

FabCon and SQLCon Highlights Carousel

FabCon &SQLCon Highlights

Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.