Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
A-kitaev
New Member

Remove duplicates - keep last vs keep first

Hi,

I need to remove rows in a table that have duplicate values and I need to keep last one. Power BI by default keeps first one regardless of sorting order. I tried to use buffered table in advanced editor, this didn't help. Any ideas how to achieve this?

1 ACCEPTED SOLUTION
v-sihou-msft
Microsoft Employee
Microsoft Employee

@A-kitaev

 

In this scenario, if you want to keep the last records associated with each group column. You can build a calculated table aggregating values with max sort within each group.  

 

I assume you a have a table like below:

 

000.PNG

 

Then you can create a calculated table like below:

 

Table = SUMMARIZE(
	Table3,Table3[Name],
	"Last Value",
	CALCULATE(SUM(Table3[Value]),
		FILTER(Table3,Table3[Sort]=MAX(Table3[Sort]))
	)
	)

77.PNG

 

 

Regards,

View solution in original post

17 REPLIES 17
flex99
Helper I
Helper I

The trick from RichieRich is a very elegant straight forward approach that provided a simple solution for my even more complex duplicates problem! Thanks a lot!

v-sihou-msft
Microsoft Employee
Microsoft Employee

@A-kitaev

 

In this scenario, if you want to keep the last records associated with each group column. You can build a calculated table aggregating values with max sort within each group.  

 

I assume you a have a table like below:

 

000.PNG

 

Then you can create a calculated table like below:

 

Table = SUMMARIZE(
	Table3,Table3[Name],
	"Last Value",
	CALCULATE(SUM(Table3[Value]),
		FILTER(Table3,Table3[Sort]=MAX(Table3[Sort]))
	)
	)

77.PNG

 

 

Regards,

Hi! Can you help me add a custom column counting the number of entries? Outcome shall be the same as the SORT column used here. Thank you!

Hi,

Share some data, describe the question and show the expected result.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

I need an additional custom column PQ Editor that COUNTS the number of OrderNumber entries..

 

hmenco_0-1627264963777.png

 

Goal is to remove the duplicates and get the latest entry based on RunDate.

 

hmenco_1-1627265076587.png

 

 

I have tried the Table.Buffer but it seems not to be applicable because it slows down the performance/process. I thought of this workaround instead...

 

Hi,

Does this M Query work faster?  If not, then let me know.  I will share a calculated column formula in DAX with you.

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"OrderNumber", Int64.Type}, {"RunDate", type date}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"OrderNumber", Order.Ascending}, {"RunDate", Order.Ascending}}),
    Partition = Table.Group(#"Sorted Rows", {"OrderNumber"}, {{"Partition", each Table.AddIndexColumn(_, "Index",1,1), type table}}),
    #"Expanded Partition" = Table.ExpandTableColumn(Partition, "Partition", {"RunDate", "Index"}, {"RunDate", "Index"})
in
    #"Expanded Partition"

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Trying to figure out things.. What if there are other columns to be displayed? The M query worked as well.. But maybe i'll use it on the smaller data. 
How can we display other columns that were hidden in the Partition?

Hi,

I shared an alternative DAX calculated solution yesterday.  Try that one.  


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

I have tried both the M query and DAX calculated column.. Both solutions worked incredibly!

I will observe M query's performance on BIG data first.. If really not applicable, I'll use your DAX calculated column solution...

 

Thank you so much for taking time!!

Sure.  Based on whichever works better on your live data, please mark that relevant response as Answer.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Hi @Ashish_Mathur, thank you but do you have another work around? The M query takes time to process big data. 

I have a DAX calculated column alternative (not an M query alternative).  Are you interested in that?


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Okay with DAX, if it will perform faster than the M query and as long as it meets the desired outcome as well! Many thanks!

Hi,

Try this calculated column formula

=calculate(countrows(data),filter(data,data[OrderNumber]=earlier(data[OrderNumber])&&data[RunDate]<=earlier(data[RunDate])))

Hope this helps.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks! It's not that elegant as I hoped it would be, but it works

There's a more elegant method here which I've just used with Table.Buffer. https://www.youtube.com/watch?v=rqDdnNxSgHQ
The crucial part in the video is after the 4 minute mark.....
The Table.Buffer command saves the sort prior to removing the duplicates ensuring you get the latest.
The Table.Buffer has to be added manually in the advanced editor
Here is an example I did where "Registration" is the group on which I'm removing duplicates and keeping the latest record (First of the date for each of the group, date descending)

let
Source = #"IVMS - As Posted View - Unstructured Master Query",
#"Sorted Rows" = Table.Sort(Source,{{"Calendar Year/Month.Calendar Year/Month Level 01", Order.Descending}}),
#"Buffer table"= Table.Buffer(#"Sorted Rows"),
#"Removed Duplicates" = Table.Distinct(#"Buffer table", {"Registration"})
in
#"Removed Duplicates"

Anonymous
Not applicable

This worked great! Thank you. 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.