cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
fiveone
Helper II
Helper II

How to remove duplicates based on sort order

When loading the data I need to keep only the latest record.

I don't have a date to rely on to determine the latest record but I have a status, based on which I can sort in a somewhat chronological order.

 I thought I could

  • Associate a statusID to each status, making sure that the lowest the ID the latest the event
  • Sort by statusID
  • Remove duplicates

Unfortunately it seems the sort order is not always respected,

I read that we should use the Table.Buffer() function before removing duplicates but that this could create performance issue.

It does indeed create performance issue. It works on a small scale but when I use this on the entire dataset the load process does not complete.

 

Now I’m looking into using Table.SelectRows with a condition that would only return the row with the smallest statusID for each RequestID.

Something along the line of

  

 #"Latest Request" = Table.SelectRows(#"Previous Step", each (List.Max([RequestID]," statusID "))),

 

But that does return a type incompatibility error

5 REPLIES 5
Anonymous
Not applicable

We also had a similar problem. We found this solved our problem:

 

1. Sort data as required

2. Add index column starting at 1

3. Remove Duplicates

 

The index column helps Power BI realise while removing duplicates that there is a specific order to the sorted data.

acstaffa
Frequent Visitor

I have already uploaded a SQL data, how can I create a function to sort the column order and force de removing to follow the date sort? I dont know how to create the function...

 

It is sad PBI dont "follow" the sort order to eliminate duplicates...

MarcelBeug
Community Champion
Community Champion

My suggestion would be the following, illustrated by this video.


I created some test data in Access just to verify if the code would allow for Query Folding (by rightclicking each step in the queries and check for the existence of Native Query). They all did, so performance should be OK..

 

I created a query InputData that just reads the data from Access. For this query, load is disabled.

 

let
    Source = Access.Database(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\How to remove duplicates based on sort order.accdb"), [CreateNavigationProperties=true]),
    _Requests = Source{[Schema="",Item="Requests"]}[Data]
in
    _Requests

 

 

I created a query MinStatusByRequestID with input from query InputData that determines the minimum statusID for each RequestID using Group By on the Transform tab. Also for this query, load is disabled.

 

let
    Source = InputData,
    #"Grouped Rows" = Table.Group(Source, {"RequestID"}, {{"minStatus", each List.Min([statusID]), type number}})
in
    #"Grouped Rows"

 

 

I created a query FilteredData that merges the previous 2 queries with an inner join. For this query, load is enabled.

 

let
    Source = InputData,
    #"Merged Queries" = Table.Join(Source,{"RequestID", "statusID"},MinStatusByRequestID,{"RequestID", "minStatus"},JoinKind.Inner),
    #"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"minStatus"})
in
    #"Removed Columns"

 

Specializing in Power Query Formula Language (M)

 Thank you so much Marcel! I have been struggling for days and googling crazy for a solution and you just solved my problem!!!

Wow, thank you Marcel for taking the time to provide that detailed explanation and even making a video!

Leveraging grouping and merging is brilliant.

 

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors