Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by watching the DP-600 session on-demand now through April 28th.
Learn moreJoin the FabCon + SQLCon recap series. Up next: Power BI, Real-Time Intelligence, IQ and AI, and Data Factory take center stage. All sessions are available on-demand after the live show. Register now
Hi
I am trying to clean up the duplicate rows in my data in Power Query.
This is a sample of the data imported.
| ID | Customer ID | Name | Subscription | Subscriber Date | Subscriber Status | |
| 1 | CUST01 | Ferris Lo | Monthly Newsletter | 1/8/2020 | Active | keep |
| 2 | CUST01 | Ferris Lo | Monthly Newsletter | 1/5/2019 | Inactive | remove |
| 3 | CUST02 | John Smith | Monthly Newsletter | 1/5/2019 | Active | keep |
| 4 | CUST03 | Jim James | Monthly Newsletter | 1/1/2018 | Active | keep |
| 5 | CUST04 | Pat Ham | Monthly Newsletter | 1/6/2020 | Inactive | keep |
6 | CUST04 | Pat Ham | Monthly Newsletter | 1/5/2019 | Inactive | remove |
and I would like it to end up like this:
based on criteria;
IF Customer ID is duplicated, and Subscriber status = Active, remove rows where Subscriber status = Inactive
IF Customer ID is duplicated, and Subscriber status = Inactive, remove oldest Subscriber Date rows (i.e. keep max date)
| ID | Customer ID | Name | Subscription | Subscriber Date | Subscriber Status |
| 1 | CUST01 | Ferris Lo | Monthly Newsletter | 1/8/2020 | Active |
| 3 | CUST02 | John Smith | Monthly Newsletter | 1/5/2019 | Active |
| 4 | CUST03 | Jim James | Monthly Newsletter | 1/1/2018 | Active |
| 5 | CUST04 | Pat Ham | Monthly Newsletter | 1/6/2020 | Inactive |
Solved! Go to Solution.
You can try this in new query and adjust accordingly.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXIODQ4xADHcUouKMosVfPKBbN/8vJKMnEoFv9Ty4pzUkpLUIqCgob6FvpGBkQGQ6ZhcklmWqhSrE61kRKIZpkAzDC2BTM+8RIQpxjBTQMZ55WfkKQTnZpZkEGEMklNMYIaATPPKzFXwSsxNLcZthiHIDAtUM0xhZoAMC0gsUfBIzMVtghksQFA8Y0aSGVgCJBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Customer ID" = _t, Name = _t, Subscription = _t, #"Subscriber Date" = _t, #"Subscriber Status" = _t]),
Grouped = Table.Group(Source, {"Customer ID"}, {{"GR", each if List.Contains(List.Distinct(_[Subscriber Status]),"Active")
then Table.SelectRows(_, each ([Subscriber Status] = "Active"))
else Table.FirstN(Table.Sort(_,{{"Subscriber Date", Order.Descending}}),1)}}),
Removed = Table.RemoveColumns(Grouped,{"Customer ID"}),
FINAL = Table.ExpandTableColumn(Removed, "GR", {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"}, {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"})
in
FINAL
Seems that you could Group on Customer ID using the GUI function, choosing "All Rows" as the aggregation. Let's say you named the Grouped table column "Details". Then you can filter for each Max Date per Customer ID:
Filtered = Table.SelectRows(PreviousStepName, each Table.Max([Details], "Subscriber Date"))
Then keep only the "Details" column, and expand it. This also eliminates the need to care what the active/inactive status, since you are returning each latest status.
--Nate
You can try this in new query and adjust accordingly.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXIODQ4xADHcUouKMosVfPKBbN/8vJKMnEoFv9Ty4pzUkpLUIqCgob6FvpGBkQGQ6ZhcklmWqhSrE61kRKIZpkAzDC2BTM+8RIQpxjBTQMZ55WfkKQTnZpZkEGEMklNMYIaATPPKzFXwSsxNLcZthiHIDAtUM0xhZoAMC0gsUfBIzMVtghksQFA8Y0aSGVgCJBYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Customer ID" = _t, Name = _t, Subscription = _t, #"Subscriber Date" = _t, #"Subscriber Status" = _t]),
Grouped = Table.Group(Source, {"Customer ID"}, {{"GR", each if List.Contains(List.Distinct(_[Subscriber Status]),"Active")
then Table.SelectRows(_, each ([Subscriber Status] = "Active"))
else Table.FirstN(Table.Sort(_,{{"Subscriber Date", Order.Descending}}),1)}}),
Removed = Table.RemoveColumns(Grouped,{"Customer ID"}),
FINAL = Table.ExpandTableColumn(Removed, "GR", {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"}, {"ID", "Customer ID", "Name", "Subscription", "Subscriber Date", "Subscriber Status"})
in
FINAL
Check out the April 2026 Power BI update to learn about new features.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
| User | Count |
|---|---|
| 3 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |
| User | Count |
|---|---|
| 8 | |
| 6 | |
| 5 | |
| 5 | |
| 4 |