Solved: Re: Filtering duplicates in funcion of a date colu...

juanca · ‎11-30-2016

Hi everyone,

I've tried to solve this issue on my own and then looking for an already solved post related to this but I wasn't able to find it, sorry in advanced if it has been already solved.

The issue is that I have a table where the key code field is duplicated in some rows and I want to keep just one row per key code with all the values related to this specific row. For each duplicated key code, which row do I want to keep? I want to keep the row that was first created (info stored in the column data). Attending to the last point, just eliminating duplicates in the query editor is not an option.

To come up with a solution I'm exploring two ways:

Calculated Table: I've tried to create a filtered table through SUMMARIZE function but using the function like this: SUMMARIZE('original_table';'original_table'[keyCode];"Date";FIRSTDATE('original_table'[date].[Date])) I manage to get the first date per each group of duplicate key codes but then, when I try to get any othe fields (SUMMARIZE('original_table';'original_table'[keyCode];"Date";FIRSTDATE('original_table'[date].[Date]);"yyy";'original_table'[yyy])) I cannot get the fields related to the row where the first date is.
Calculated Column: Trying to create a boolean column, with a True or False in each row depending on that row was the one to be selected (conditions already explained).

I hope I explained correctly, if you need any extra explanation please ask.

Any tip will be more than welcome.

Thanks a lot for your time!

ImkeF · ‎12-02-2016

Sorry, yes - the sorting order must have been ascending.

The "Remove-Duplicates"-step will retain the first rows, so they must be on top.

It could be that you don't get the correct result after changing the sort-order due to the buffering in the cache. You can try emptying the cache, then it should work. But this highlights the danger on relying to a sort order without using an explicit buffer in your code. Power BI isn't good at keeping sort-orders, so you have to edit your code unfortunately:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgWTRgYQyhhKg0lDCMdcKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(#"Changed Type",{{"Column1", Order.Ascending}}))
in
    #"Sorted Rows"

So you need to "wrap" the sorting-step into a Table.Buffer(....) before applying the Remove-Duplicates step.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

juanca · ‎12-01-2016

Hi everyone,

Trying to be a bit more specific here is an example of the table origin and the result that I want to get:

In red colour those rows that have to be deleted because I want to keep the row whith an earliest date (the last duplicates share the same date, in these cases I want to keep the first register on the table).

Thanks a lot again!

ImkeF · ‎12-01-2016

You have to do this in the query-editor:

1) Sort your table on key-EAC and fecha (descending).

2) Check columns: key-EAC and fecha and remove duplicates (either right-click your mouse or: Home -> Reduce Rows -> Remove Rows -> Remove Duplicates.

This will keep the first occurences of the duplicated rows (where only the selected columns "count" for the dups) and remove the others.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

juanca · ‎12-01-2016

Thanks a lot for your asnwer Imke,

I don't understand the behaviour of the remove duplicates execution, but I tried what you proposed and I didn't get the result I'm looking for.

In the next picture you can see that performing the steps you mentioned and finally ordering "fecha" descending, the result isn't the correct, changind the type of ordering in "fecha" to ascending doesn't make any change in the result after removing duplicates:

Please, let me know if you see anything incorrect or you have any other proposal.

Thanks a lot again for your time.

P.D.: Now I'm working on solution based on an R script but may (should!) be an easier way to solve it

ImkeF · ‎12-02-2016

Sorry, yes - the sorting order must have been ascending.

The "Remove-Duplicates"-step will retain the first rows, so they must be on top.

It could be that you don't get the correct result after changing the sort-order due to the buffering in the cache. You can try emptying the cache, then it should work. But this highlights the danger on relying to a sort order without using an explicit buffer in your code. Power BI isn't good at keeping sort-orders, so you have to edit your code unfortunately:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgWTRgYQyhhKg0lDCMdcKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(#"Changed Type",{{"Column1", Order.Ascending}}))
in
    #"Sorted Rows"

So you need to "wrap" the sorting-step into a Table.Buffer(....) before applying the Remove-Duplicates step.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

danextian · ‎05-03-2017

@ImkeF

I didnt know about having to write in Table.Buffer the sorting step. I found out only just a while ago that, although I have already sorted the rows, the remove duplicate function did not retain the topmost row.

Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

ImkeF · ‎05-03-2017

You should file a bug-report then.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

juanca · ‎12-02-2016

Hello Imke,

It works perfectly now! I really appreciate your help and the way you explained the last point about buffering.

Thanks for sharing your knowledge.

Kind regards, juanca.

Filtering duplicates in funcion of a date column

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

How to Get Your Question Answered Quickly

Party with Power BI’s own Guy in a Cube

Filtering duplicates in funcion of a date column

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

How to Get Your Question Answered Quickly