Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
GFire
New Member

Eliminate duplicate records while keeping the most recent record.

tableExample.png

 

 

 

 

 

 

 

 

 

Hi,

 

By default, Power Query keeps the oldest record and deletes the newest record when removing duplicates. I need the opposite: delete the oldest record and keep the newest.
The following code does the trick. The problem is that it works record by record, and when the table size is large, the process slows down dramatically.

 

Max Index by duplicates =

List.Max( Table.SelectRows(#Index add, (r) = r[N_Invoice] = [N_Invoice])[Index] )

 

I need help to be able to do it more efficiently.

 

Thanks!!

 

Link to Dataset

https://drive.google.com/drive/folders/1LanSPa_O1_qirmJYPIBh1IaJQkMJ910_?usp=sharing

1 ACCEPTED SOLUTION
ronrsnfld
Super User
Super User

Assuming the latest is the last entry for any set of duplicates, you can

  • Group by Invoice Number
  • Return only the last row each subtable
let

//Read in your data
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\Dataset.csv"),[Delimiter=";", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"N_Invoice", type text}, {"Index", Int64.Type}}),

//Group by N_invoice and retain the last row
    #"Grouped Rows" = Table.Group(#"Changed Type", {"N_Invoice"}, {
            {"Latest", each Table.Last(_), type [N_Invoice=nullable text, Index=nullable number]}}),

//remove column and re-expand
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"N_Invoice"}),
    #"Expanded Latest" = Table.ExpandRecordColumn(#"Removed Columns", "Latest", {"N_Invoice", "Index"}, {"N_Invoice", "Index"})
in
    #"Expanded Latest"

 

This will also retain the original ordering of the entries. You can see in the screenshot below, FV-084482 is in the second row of both your original data, as well as the dedup'd data, even though the "latest" invoice was found near the bottom of the table as indicated by the associated Index number.

 

Note that Table.Group is not guaranteed to retain your original row order, which would be important in the above algorithm. However, others have reported that the presence of an Index column ameliorates this issue. So ensure that your actual data has an index column (or add one if it does not). 

If there are issues of that nature, then merely change the custom aggregation in the Table.Group function to: 

 {"Latest", (t)=> Table.SelectRows(t, each [Index]=List.Max(t[Index])), type table [N_Invoice=nullable text, Index=nullable number]}})

 

 

If you want a different order, please specify.

 

ronrsnfld_0-1749950965133.png

 

 

View solution in original post

5 REPLIES 5
V-yubandi-msft
Community Support
Community Support

Hi @GFire ,

Thank you for reaching out to the Fabric community.

You are correct  by default, Power Query keeps the first (oldest) record when removing duplicates. However, if you prefer to keep the latest (newest) record, your method of finding the max index per duplicate is accurate.

 

I tested your scenario with your provided dataset and implemented a more optimized approach.

  1. Added an Index column.
  2. Sorted the table in descending order based on the Index (to bring the newest rows to the top).
  3. Used Table.Distinct on the N_Invoice column to keep only the first occurrence of each which now is the newest record.

This solution avoids the row by row logic and performs efficiently on larger datasets.

 

FYI:

Vyubandimsft_0-1750074173052.png

 

Thanks for your response @ronrsnfld & @Nasif_Azam .

 

Hope this helps. If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

ronrsnfld
Super User
Super User

Assuming the latest is the last entry for any set of duplicates, you can

  • Group by Invoice Number
  • Return only the last row each subtable
let

//Read in your data
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\Dataset.csv"),[Delimiter=";", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"N_Invoice", type text}, {"Index", Int64.Type}}),

//Group by N_invoice and retain the last row
    #"Grouped Rows" = Table.Group(#"Changed Type", {"N_Invoice"}, {
            {"Latest", each Table.Last(_), type [N_Invoice=nullable text, Index=nullable number]}}),

//remove column and re-expand
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"N_Invoice"}),
    #"Expanded Latest" = Table.ExpandRecordColumn(#"Removed Columns", "Latest", {"N_Invoice", "Index"}, {"N_Invoice", "Index"})
in
    #"Expanded Latest"

 

This will also retain the original ordering of the entries. You can see in the screenshot below, FV-084482 is in the second row of both your original data, as well as the dedup'd data, even though the "latest" invoice was found near the bottom of the table as indicated by the associated Index number.

 

Note that Table.Group is not guaranteed to retain your original row order, which would be important in the above algorithm. However, others have reported that the presence of an Index column ameliorates this issue. So ensure that your actual data has an index column (or add one if it does not). 

If there are issues of that nature, then merely change the custom aggregation in the Table.Group function to: 

 {"Latest", (t)=> Table.SelectRows(t, each [Index]=List.Max(t[Index])), type table [N_Invoice=nullable text, Index=nullable number]}})

 

 

If you want a different order, please specify.

 

ronrsnfld_0-1749950965133.png

 

 

Nasif_Azam
Solution Supplier
Solution Supplier

Hey @GFire ,

You can efficiently keep the most recent record for each N_Invoice in Power Query by leveraging Table.Group and List.Max or by using a stable sort followed by Table.Distinct with a twist.

Power Query Approach

  1. Sort by Index descending (newest records first).

  2. Remove duplicates based on N_Invoice, keeping the first row (which is now the most recent due to the sort).

M Code:

let
    Source = ... ,  // your original table here
    SortedTable = Table.Sort(Source, {{"Index", Order.Descending}}),
    Deduplicated = Table.Distinct(SortedTable, {"N_Invoice"})
in
    Deduplicated

Sorting by Index in descending order ensures the most recent entries come first. Table.Distinct keeps the first occurrence of each duplicate group (based on N_Invoice), which in this sorted state is the most recent record.

 

You mentioned this code which is a slow code so avoid that code: List.Max( Table.SelectRows(#"Index add", (r) => r[N_Invoice] = [N_Invoice])[Index] )

 

For Detailed Information:

Power Query M Table functions

Avoid row context for better performance

SQLBI Power Query techniques

Remove duplicates and keep the last (or first) record in Power BI

 

 

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.


Best Regards,
Nasif Azam

Hi,

 

It doesn't work. For example: N_Invoice number FV-084497 is in the dataset with index numbers 5 and 59. Once the code was applied, the resulting table removed index:59 and retained index:5.

Hi @GFire ,

Thanks for your feedback. Let's try this M code with confident:

let
    Source = ... ,  // Replace with your actual source
    Grouped = Table.Group(Source, {"N_Invoice"}, {
        {"TopRow", each Table.Sort(_, {{"Index", Order.Descending}}){0}, type record}
    }),
    Result = Table.FromRecords(Grouped[TopRow])
in
    Result

What That M Code Do:

  • Groups the table by N_Invoice

  • Sorts each group in descending order by Index

  • Picks the first row from each group (i.e., the most recent one)

  • Flattens it back into a normal table

 

If you found this solution helpful, please consider accepting it and giving it a kudos (Like) it’s greatly appreciated and helps others find the solution more easily.


Best Regards,
Nasif Azam

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors