Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
I have a table that has ID and date columns. I want to remove duplicates and only keep the most recent record for each ID. I thought I'd by able to simply by sorting either ascending or descending then removing duplicates, but it doesn't keep the first or last record in your sort. It only keeps the first record in the source dataset regardless of sort. Is there a way around this?
Solved! Go to Solution.
I've run into this before. I think it's a result of query folding not operating in the order that you'd expect.
The workaround that worked for me was to add an index column after sorting before removing duplicates. This breaks query folding and forces it to operate in the order you just defined by your sort.
@AlexisOlson @Anonymous
Do check this video from curbal.com. It beautifully explains the reason why changing Sort Order in Power query doesn't work for removing duplicates and using Table.Buffer to keep new Sorting Order in memory.
Sharing this as it might be helpful in some other scenarios too..
Maybe it is possible to use the function that was added in 2022, Table.StopFolding( SortedTable ), to break folding and keep the sorting.
I've run into this before. I think it's a result of query folding not operating in the order that you'd expect.
The workaround that worked for me was to add an index column after sorting before removing duplicates. This breaks query folding and forces it to operate in the order you just defined by your sort.
Pefect! Thanks for you help 😊.
Worked perfectly!
I need to remove duplicates based on keeping the second instance rather than the first, if I add an index and sort descending then remove duplicates should that work?
If there are always exactly two instances, this should work. Are there always at least two and no more than two?
Thanks Alexis! Worked perfectly.
@AlexisOlson @Anonymous
Do check this video from curbal.com. It beautifully explains the reason why changing Sort Order in Power query doesn't work for removing duplicates and using Table.Buffer to keep new Sorting Order in memory.
Sharing this as it might be helpful in some other scenarios too..
@AnkitBI , are there advantages to using Table.Buffer versus using Index like @AlexisOlson suggested? Index worked for me, but I wanted to know if there are cases where it won't.
Thanks!
The underlying idea in both cases is to prevent the query from using the original cached table in memory (since it's not the order we want). It's conceivable that in some cases the internal engine might fold in the index step without reloading the table, which would sort of defeat the purpose of adding that step. Table.Buffer explicitly reloads the table to memory so I'd guess you have fewer edge cases to worry about.
(The above is somewhat speculative, so if anyone knows more definitively, please let me know.)
I tried using Table.Buffer today before removing duplicates, and I ended up freezing up my file. But adding Index seemed to perform better.
But how does add index helps in solving the issue can you elaborate
Thanks in advance 😊
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
115 | |
112 | |
105 | |
95 | |
58 |
User | Count |
---|---|
174 | |
147 | |
136 | |
102 | |
82 |