Solved: Re: Eliminate duplicate records while keeping the ...

GFire

Hi,

By default, Power Query keeps the oldest record and deletes the newest record when removing duplicates. I need the opposite: delete the oldest record and keep the newest.
The following code does the trick. The problem is that it works record by record, and when the table size is large, the process slows down dramatically.

Max Index by duplicates =

List.Max( Table.SelectRows(#Index add, (r) = r[N_Invoice] = [N_Invoice])[Index] )

I need help to be able to do it more efficiently.

Thanks!!

Link to Dataset

https://drive.google.com/drive/folders/1LanSPa_O1_qirmJYPIBh1IaJQkMJ910_?usp=sharing

ronrsnfld

Assuming the latest is the last entry for any set of duplicates, you can

Group by Invoice Number
Return only the last row each subtable

let

//Read in your data
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\Dataset.csv"),[Delimiter=";", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"N_Invoice", type text}, {"Index", Int64.Type}}),

//Group by N_invoice and retain the last row
    #"Grouped Rows" = Table.Group(#"Changed Type", {"N_Invoice"}, {
            {"Latest", each Table.Last(_), type [N_Invoice=nullable text, Index=nullable number]}}),

//remove column and re-expand
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"N_Invoice"}),
    #"Expanded Latest" = Table.ExpandRecordColumn(#"Removed Columns", "Latest", {"N_Invoice", "Index"}, {"N_Invoice", "Index"})
in
    #"Expanded Latest"

This will also retain the original ordering of the entries. You can see in the screenshot below, FV-084482 is in the second row of both your original data, as well as the dedup'd data, even though the "latest" invoice was found near the bottom of the table as indicated by the associated Index number.

Note that Table.Group is not guaranteed to retain your original row order, which would be important in the above algorithm. However, others have reported that the presence of an Index column ameliorates this issue. So ensure that your actual data has an index column (or add one if it does not).

If there are issues of that nature, then merely change the custom aggregation in the Table.Group function to:

 {"Latest", (t)=> Table.SelectRows(t, each [Index]=List.Max(t[Index])), type table [N_Invoice=nullable text, Index=nullable number]}})

If you want a different order, please specify.

View solution in original post

V-yubandi-msft

Hi @GFire ,

Thank you for reaching out to the Fabric community.

You are correct by default, Power Query keeps the first (oldest) record when removing duplicates. However, if you prefer to keep the latest (newest) record, your method of finding the max index per duplicate is accurate.

I tested your scenario with your provided dataset and implemented a more optimized approach.

Added an Index column.
Sorted the table in descending order based on the Index (to bring the newest rows to the top).
Used Table.Distinct on the N_Invoice column to keep only the first occurrence of each which now is the newest record.

This solution avoids the row by row logic and performs efficiently on larger datasets.

FYI:

Thanks for your response @ronrsnfld & @Nasif_Azam .

Hope this helps. If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

ronrsnfld

Assuming the latest is the last entry for any set of duplicates, you can

Group by Invoice Number
Return only the last row each subtable

let

//Read in your data
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\Dataset.csv"),[Delimiter=";", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"N_Invoice", type text}, {"Index", Int64.Type}}),

//Group by N_invoice and retain the last row
    #"Grouped Rows" = Table.Group(#"Changed Type", {"N_Invoice"}, {
            {"Latest", each Table.Last(_), type [N_Invoice=nullable text, Index=nullable number]}}),

//remove column and re-expand
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"N_Invoice"}),
    #"Expanded Latest" = Table.ExpandRecordColumn(#"Removed Columns", "Latest", {"N_Invoice", "Index"}, {"N_Invoice", "Index"})
in
    #"Expanded Latest"

This will also retain the original ordering of the entries. You can see in the screenshot below, FV-084482 is in the second row of both your original data, as well as the dedup'd data, even though the "latest" invoice was found near the bottom of the table as indicated by the associated Index number.

Note that Table.Group is not guaranteed to retain your original row order, which would be important in the above algorithm. However, others have reported that the presence of an Index column ameliorates this issue. So ensure that your actual data has an index column (or add one if it does not).

If there are issues of that nature, then merely change the custom aggregation in the Table.Group function to:

 {"Latest", (t)=> Table.SelectRows(t, each [Index]=List.Max(t[Index])), type table [N_Invoice=nullable text, Index=nullable number]}})

If you want a different order, please specify.

Nasif_Azam

Hey @GFire ,

You can efficiently keep the most recent record for each N_Invoice in Power Query by leveraging Table.Group and List.Max or by using a stable sort followed by Table.Distinct with a twist.

Power Query Approach

Sort by Index descending (newest records first).
Remove duplicates based on N_Invoice, keeping the first row (which is now the most recent due to the sort).

M Code:

let
    Source = ... ,  // your original table here
    SortedTable = Table.Sort(Source, {{"Index", Order.Descending}}),
    Deduplicated = Table.Distinct(SortedTable, {"N_Invoice"})
in
    Deduplicated

Sorting by Index in descending order ensures the most recent entries come first. Table.Distinct keeps the first occurrence of each duplicate group (based on N_Invoice), which in this sorted state is the most recent record.

You mentioned this code which is a slow code so avoid that code: List.Max( Table.SelectRows(#"Index add", (r) => r[N_Invoice] = [N_Invoice])[Index] )

For Detailed Information:

Power Query M Table functions

Avoid row context for better performance

SQLBI Power Query techniques

Remove duplicates and keep the last (or first) record in Power BI

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.

Best Regards,
Nasif Azam

GFire

Hi,

It doesn't work. For example: N_Invoice number FV-084497 is in the dataset with index numbers 5 and 59. Once the code was applied, the resulting table removed index:59 and retained index:5.

Nasif_Azam

Hi @GFire ,

Thanks for your feedback. Let's try this M code with confident:

let
    Source = ... ,  // Replace with your actual source
    Grouped = Table.Group(Source, {"N_Invoice"}, {
        {"TopRow", each Table.Sort(_, {{"Index", Order.Descending}}){0}, type record}
    }),
    Result = Table.FromRecords(Grouped[TopRow])
in
    Result

What That M Code Do:

Groups the table by N_Invoice
Sorts each group in descending order by Index
Picks the first row from each group (i.e., the most recent one)
Flattens it back into a normal table

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.

Best Regards,
Nasif Azam

Eliminate duplicate records while keeping the most recent record.

Power Query Approach

M Code:

What That M Code Do:

Helpful resources

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

Become a Certified Power BI Data Analyst!

Eliminate duplicate records while keeping the most recent record.

Power Query Approach

M Code:

What That M Code Do:

Helpful resources

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025