Reply
anabianco09
Helper II
Helper II
Partially syndicated - Outbound

Deleting duplicate based on Date Column

Hi all, I am trying to delete some duplicate numbers based on a Date Column, i.e., I want to delete the old and keep only the newest record.

 

anabianco09_0-1703085755316.png

 

How to do this?

Thanks

1 ACCEPTED SOLUTION
v-cgao-msft
Community Support
Community Support

Syndicated - Outbound

Hi @anabianco09 ,

I'm guessing you want to group by the [Ebgine Serial #] field and keep the records with the latest date in each group.
If so, you can:
1. [Ebgine Serial #] in ascending order, [Estimated Arrival Date] in descending order.

vcgaomsft_0-1703480213056.png

2. Group rows by [Ebgine Serial #]  and add a index column:

vcgaomsft_1-1703480542932.png

3. expand data and filter rows and remove index column:

vcgaomsft_2-1703480617673.png

Advanced editor:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRU0lEyNNAHIiMDI2MQx1ApVgcmYagPRFAJIyOwhJMTug5jYyQJZB0mJkgSRvpABJUwNVWKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Ebgine Serial #" = _t, #"Estimated Arrival Date" = _t, #" Value" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Ebgine Serial #", type text}, {"Estimated Arrival Date", type date}, {" Value", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Ebgine Serial #", Order.Ascending}, {"Estimated Arrival Date", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Ebgine Serial #"}, {{"Data", each Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type)}}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Grouped Rows", "Data", {"Estimated Arrival Date", " Value", "Index"}, {"Estimated Arrival Date", " Value", "Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Data", each ([Index] = 0)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
    #"Removed Columns"


If the latest date has multiple records, you can add a custom column:

vcgaomsft_3-1703481058043.png

Then filter rows:

= Table.SelectRows(#"Added Custom", each [Estimated Arrival Date] = [Latest Date])

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum -- China Power BI User Group

View solution in original post

3 REPLIES 3
v-cgao-msft
Community Support
Community Support

Syndicated - Outbound

Hi @anabianco09 ,

I'm guessing you want to group by the [Ebgine Serial #] field and keep the records with the latest date in each group.
If so, you can:
1. [Ebgine Serial #] in ascending order, [Estimated Arrival Date] in descending order.

vcgaomsft_0-1703480213056.png

2. Group rows by [Ebgine Serial #]  and add a index column:

vcgaomsft_1-1703480542932.png

3. expand data and filter rows and remove index column:

vcgaomsft_2-1703480617673.png

Advanced editor:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRU0lEyNNAHIiMDI2MQx1ApVgcmYagPRFAJIyOwhJMTug5jYyQJZB0mJkgSRvpABJUwNVWKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Ebgine Serial #" = _t, #"Estimated Arrival Date" = _t, #" Value" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Ebgine Serial #", type text}, {"Estimated Arrival Date", type date}, {" Value", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Ebgine Serial #", Order.Ascending}, {"Estimated Arrival Date", Order.Descending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Ebgine Serial #"}, {{"Data", each Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type)}}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Grouped Rows", "Data", {"Estimated Arrival Date", " Value", "Index"}, {"Estimated Arrival Date", " Value", "Index"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Data", each ([Index] = 0)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
    #"Removed Columns"


If the latest date has multiple records, you can add a custom column:

vcgaomsft_3-1703481058043.png

Then filter rows:

= Table.SelectRows(#"Added Custom", each [Estimated Arrival Date] = [Latest Date])

Best Regards,
Gao

Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly --  How to provide sample data in the Power BI Forum -- China Power BI User Group

Syndicated - Outbound

Thanks Gao! That worked!

vicky_
Super User
Super User

Syndicated - Outbound

It's a little unclear what you mean by duplicate records. 
If it's about only keeping records from the latest date, you can filter the date and choose the "Is Latest" option.

vicky__0-1703106251117.png


Otherwise, if you mean that there's a certain key value, then please try and follow the guide here: https://exceleratorbi.com.au/remove-duplicates-keep-last-record-power-query/

 

avatar user

Helpful resources

Announcements
March PBI video - carousel

Power BI Monthly Update - March 2025

Check out the March 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)