The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
When loading the data I need to keep only the latest record.
I don't have a date to rely on to determine the latest record but I have a status, based on which I can sort in a somewhat chronological order.
I thought I could
Unfortunately it seems the sort order is not always respected,
I read that we should use the Table.Buffer() function before removing duplicates but that this could create performance issue.
It does indeed create performance issue. It works on a small scale but when I use this on the entire dataset the load process does not complete.
Now I’m looking into using Table.SelectRows with a condition that would only return the row with the smallest statusID for each RequestID.
Something along the line of
#"Latest Request" = Table.SelectRows(#"Previous Step", each (List.Max([RequestID]," statusID "))),
But that does return a type incompatibility error
We also had a similar problem. We found this solved our problem:
1. Sort data as required
2. Add index column starting at 1
3. Remove Duplicates
The index column helps Power BI realise while removing duplicates that there is a specific order to the sorted data.
I have already uploaded a SQL data, how can I create a function to sort the column order and force de removing to follow the date sort? I dont know how to create the function...
It is sad PBI dont "follow" the sort order to eliminate duplicates...
My suggestion would be the following, illustrated by this video.
I created some test data in Access just to verify if the code would allow for Query Folding (by rightclicking each step in the queries and check for the existence of Native Query). They all did, so performance should be OK..
I created a query InputData that just reads the data from Access. For this query, load is disabled.
let Source = Access.Database(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\How to remove duplicates based on sort order.accdb"), [CreateNavigationProperties=true]), _Requests = Source{[Schema="",Item="Requests"]}[Data] in _Requests
I created a query MinStatusByRequestID with input from query InputData that determines the minimum statusID for each RequestID using Group By on the Transform tab. Also for this query, load is disabled.
let Source = InputData, #"Grouped Rows" = Table.Group(Source, {"RequestID"}, {{"minStatus", each List.Min([statusID]), type number}}) in #"Grouped Rows"
I created a query FilteredData that merges the previous 2 queries with an inner join. For this query, load is enabled.
let Source = InputData, #"Merged Queries" = Table.Join(Source,{"RequestID", "statusID"},MinStatusByRequestID,{"RequestID", "minStatus"},JoinKind.Inner), #"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"minStatus"}) in #"Removed Columns"
Thank you so much Marcel! I have been struggling for days and googling crazy for a solution and you just solved my problem!!!
Wow, thank you Marcel for taking the time to provide that detailed explanation and even making a video!
Leveraging grouping and merging is brilliant.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
78 | |
71 | |
52 | |
50 |
User | Count |
---|---|
123 | |
119 | |
76 | |
64 | |
60 |