Skip to main content
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Helper II
Helper II

How to remove duplicates based on sort order

When loading the data I need to keep only the latest record.

I don't have a date to rely on to determine the latest record but I have a status, based on which I can sort in a somewhat chronological order.

 I thought I could

  • Associate a statusID to each status, making sure that the lowest the ID the latest the event
  • Sort by statusID
  • Remove duplicates

Unfortunately it seems the sort order is not always respected,

I read that we should use the Table.Buffer() function before removing duplicates but that this could create performance issue.

It does indeed create performance issue. It works on a small scale but when I use this on the entire dataset the load process does not complete.


Now I’m looking into using Table.SelectRows with a condition that would only return the row with the smallest statusID for each RequestID.

Something along the line of


 #"Latest Request" = Table.SelectRows(#"Previous Step", each (List.Max([RequestID]," statusID "))),


But that does return a type incompatibility error

Not applicable

We also had a similar problem. We found this solved our problem:


1. Sort data as required

2. Add index column starting at 1

3. Remove Duplicates


The index column helps Power BI realise while removing duplicates that there is a specific order to the sorted data.

Frequent Visitor

I have already uploaded a SQL data, how can I create a function to sort the column order and force de removing to follow the date sort? I dont know how to create the function...


It is sad PBI dont "follow" the sort order to eliminate duplicates...

Community Champion
Community Champion

My suggestion would be the following, illustrated by this video.

I created some test data in Access just to verify if the code would allow for Query Folding (by rightclicking each step in the queries and check for the existence of Native Query). They all did, so performance should be OK..


I created a query InputData that just reads the data from Access. For this query, load is disabled.


    Source = Access.Database(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\How to remove duplicates based on sort order.accdb"), [CreateNavigationProperties=true]),
    _Requests = Source{[Schema="",Item="Requests"]}[Data]



I created a query MinStatusByRequestID with input from query InputData that determines the minimum statusID for each RequestID using Group By on the Transform tab. Also for this query, load is disabled.


    Source = InputData,
    #"Grouped Rows" = Table.Group(Source, {"RequestID"}, {{"minStatus", each List.Min([statusID]), type number}})
    #"Grouped Rows"



I created a query FilteredData that merges the previous 2 queries with an inner join. For this query, load is enabled.


    Source = InputData,
    #"Merged Queries" = Table.Join(Source,{"RequestID", "statusID"},MinStatusByRequestID,{"RequestID", "minStatus"},JoinKind.Inner),
    #"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"minStatus"})
    #"Removed Columns"


Specializing in Power Query Formula Language (M)

 Thank you so much Marcel! I have been struggling for days and googling crazy for a solution and you just solved my problem!!!

Wow, thank you Marcel for taking the time to provide that detailed explanation and even making a video!

Leveraging grouping and merging is brilliant.


Helpful resources

July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.