This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreLevel up your Power BI skills this month - build one visual each week and tell better stories with data! Get started
Hello,
I have quite a large dataset and I need to filter out duplicate records and always bring back the latest record as per the field 'modifieddate'.
Looking on the forums, I've seen the approach to sort the date in descending order, using table.buffer on the sort, then the next step to remove duplicates on my unique key.
This is fine when I do this on smaller datasets, but on large ones it's just causing the query to crash or not load at all.
Can you assist, please? The datasource is coming from SQL, so I'm not sure if there's something smart I can do in the source step using some SQL or not..
Hopefully the above makes sense,
Many thanks,
Dayna
Solved! Go to Solution.
@Dayna if you are getting it from SQL, you can do this easily on the SQL side and would also be more performant, quicker.
Example
declare @t1 as table (date date, record int, grp int)
insert into @t1
select * from
(values('2021-1-1',1,1),('2021-1-2',2,1),('2021-1-3',3,1),
('2021-1-1',1,2),('2021-1-2',2,2),('2021-1-3',3,2) ) t (a,b,c)
select grp,record,date as date
from @t1 a
where a.date in (select max(b.date) from @t1 b group by grp)
order by grp
I'll be using the data differently depending on my dataflow, so ideally I'd like to do this in PowerBI itself.
@Dayna PQ has performance issue which really does not matter for a small dataset. But for large tables, if the server-side transformation/filtering/aggregation is not applied, then PQ will not do it any faster. But if that is your only choice, I am not sure how can you troubleshoot that.
@Dayna if you are getting it from SQL, you can do this easily on the SQL side and would also be more performant, quicker.
Example
declare @t1 as table (date date, record int, grp int)
insert into @t1
select * from
(values('2021-1-1',1,1),('2021-1-2',2,1),('2021-1-3',3,1),
('2021-1-1',1,2),('2021-1-2',2,2),('2021-1-3',3,2) ) t (a,b,c)
select grp,record,date as date
from @t1 a
where a.date in (select max(b.date) from @t1 b group by grp)
order by grp
@Dayna did you have a chance to try out the above yet?
Sorry just seen this!
Yes, that helps, in addition to changing how the incremental refresh to use created date as the load field, and modified to identify changes, I think I'm sorted!
Thanks for all your help.
Check out the April 2026 Power BI update to learn about new features.
Sign up to receive a private message when registration opens and key events begin.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
| User | Count |
|---|---|
| 36 | |
| 28 | |
| 27 | |
| 20 | |
| 18 |
| User | Count |
|---|---|
| 66 | |
| 35 | |
| 33 | |
| 26 | |
| 24 |