Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hello all, I have a question about eliminating duplicates using excel power query.
I'm pulling data from a folder that I routinely put new CSVs into. There's almost always duplicate transactions which I remove using both an invoice # and commission amount in the query. I want to keep the first transaction and prevent any new duplicates from being added to my dataset post query. I'm still testing this query and the first few datasets worked perfectly until my most recent csv.
Here's whats happening:
CSV 1: includes transactions dated 10/1-10/31
CSV 2: includes transactions dated 11/1-11/12 with a few intentionally placed duplicates. Duplicates are removed and the original transactions from CSV 1 are retained in the query.
CSV 3: includes transactions dated 11/12-11/15. All 11/12 duplicates are removed and 11/12 transactions from CSV 2 are retained.
(The system is doing what I want up to this point)
CSV 4: includes transactions dated 11/8-11/15. The trsancations from CSV 4 replaces all 11/8-11/15 transactions in the query appearing to eliminate the transactions from CSV 3 and some from CSV 2.
What I'm hoping to accomplish is to download new transactions, add them to the power query, filter out the new duplicates while retaining records which have already passed through the power query, and sort the transactions in a table where I can sort past transactions from the most recent addition. I created a new table column where I could manually mark transactions as "Reviewed" before downloading a new set. It appeared to be working at first but has now stopped. Might this be because there are now three potential duplicates for these records rather than just two being processed through the query?
Any tips on how I can prevent power query from overwriting past transactions I've marked as "Reviewed" and prevent it from adding the most recent duplicate transactions?
Thanks!
Thanks @KNP ! That might be exactly what I'm looking for. I'll do some more research on buffering and try getting that worked into the code. Here's a snippet of the current code for my duplicate removal. All sorting at the moment is done in the table itself post query. Learning the code for sorting in the query is a next step but I'd thought it wouldn't make a difference when the sorting took place.
Definitely don't rely on the sorting to be maintained. Make it explicit.
Something like this...
= Table.Buffer(Table.Sort(PreviousStep,{{"ColumnName", Order.Ascending}}))
Ascending/Descending and other sorting columns as required of course.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
Probably need to see some data and/or the query code.
Specifically, how you are doing the duplicate removal.
Stab in the dark guess, if you're using a Table.Sort followed by a Table.Distinct anywhere. Make sure you add a Table.Buffer immediately BEFORE the Table.Distinct.
It forces the sort order to be maintained.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |