Skip to main content
cancel
Showing results for
Search instead for
Did you mean:

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

New Member

## Remove duplicates - keep last vs keep first

Hi,

I need to remove rows in a table that have duplicate values and I need to keep last one. Power BI by default keeps first one regardless of sorting order. I tried to use buffered table in advanced editor, this didn't help. Any ideas how to achieve this?

1 ACCEPTED SOLUTION
Employee

@A-kitaev

In this scenario, if you want to keep the last records associated with each group column. You can build a calculated table aggregating values with max sort within each group.

I assume you a have a table like below:

Then you can create a calculated table like below:

```Table = SUMMARIZE(
Table3,Table3[Name],
"Last Value",
CALCULATE(SUM(Table3[Value]),
FILTER(Table3,Table3[Sort]=MAX(Table3[Sort]))
)
)```

Regards,

17 REPLIES 17
Helper I

The trick from RichieRich is a very elegant straight forward approach that provided a simple solution for my even more complex duplicates problem! Thanks a lot!

Employee

@A-kitaev

In this scenario, if you want to keep the last records associated with each group column. You can build a calculated table aggregating values with max sort within each group.

I assume you a have a table like below:

Then you can create a calculated table like below:

```Table = SUMMARIZE(
Table3,Table3[Name],
"Last Value",
CALCULATE(SUM(Table3[Value]),
FILTER(Table3,Table3[Sort]=MAX(Table3[Sort]))
)
)```

Regards,

Frequent Visitor

Hi! Can you help me add a custom column counting the number of entries? Outcome shall be the same as the SORT column used here. Thank you!

Super User

Hi,

Share some data, describe the question and show the expected result.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Frequent Visitor

I need an additional custom column PQ Editor that COUNTS the number of OrderNumber entries..

Goal is to remove the duplicates and get the latest entry based on RunDate.

I have tried the Table.Buffer but it seems not to be applicable because it slows down the performance/process. I thought of this workaround instead...

Super User

Hi,

Does this M Query work faster?  If not, then let me know.  I will share a calculated column formula in DAX with you.

``````let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"OrderNumber", Int64.Type}, {"RunDate", type date}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrderNumber", Order.Ascending}, {"RunDate", Order.Ascending}}),
Partition = Table.Group(#"Sorted Rows", {"OrderNumber"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
#"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"RunDate", "Index"}, {"RunDate", "Index"})
in
#"Expanded Partition"``````

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Frequent Visitor

Trying to figure out things.. What if there are other columns to be displayed? The M query worked as well.. But maybe i'll use it on the smaller data.
How can we display other columns that were hidden in the Partition?

Super User

Hi,

I shared an alternative DAX calculated solution yesterday.  Try that one.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Frequent Visitor

I have tried both the M query and DAX calculated column.. Both solutions worked incredibly!

I will observe M query's performance on BIG data first.. If really not applicable, I'll use your DAX calculated column solution...

Thank you so much for taking time!!

Super User

Sure.  Based on whichever works better on your live data, please mark that relevant response as Answer.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Frequent Visitor

Hi @Ashish_Mathur, thank you but do you have another work around? The M query takes time to process big data.

Super User

I have a DAX calculated column alternative (not an M query alternative).  Are you interested in that?

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Frequent Visitor

Okay with DAX, if it will perform faster than the M query and as long as it meets the desired outcome as well! Many thanks!

Super User

Hi,

Try this calculated column formula

=calculate(countrows(data),filter(data,data[OrderNumber]=earlier(data[OrderNumber])&&data[RunDate]<=earlier(data[RunDate])))

Hope this helps.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
New Member

Thanks! It's not that elegant as I hoped it would be, but it works

Advocate III

There's a more elegant method here which I've just used with Table.Buffer. https://www.youtube.com/watch?v=rqDdnNxSgHQ
The crucial part in the video is after the 4 minute mark.....
The Table.Buffer command saves the sort prior to removing the duplicates ensuring you get the latest.
The Table.Buffer has to be added manually in the advanced editor
Here is an example I did where "Registration" is the group on which I'm removing duplicates and keeping the latest record (First of the date for each of the group, date descending)

let
Source = #"IVMS - As Posted View - Unstructured Master Query",
#"Sorted Rows" = Table.Sort(Source,{{"Calendar Year/Month.Calendar Year/Month Level 01", Order.Descending}}),
#"Buffer table"= Table.Buffer(#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffer table", {"Registration"})
in
#"Removed Duplicates"

Anonymous
Not applicable

This worked great! Thank you.

## Helpful resources

Announcements

#### New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

#### Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

#### Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors