Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The FabCon + SQLCon recap series starts April 14th at 8am Pacific. If you’re tracking where AI is going inside Fabric, this first session is a can't miss. Register now

Reply
RossS
Helper II
Helper II

Remove duplicates based on Vendor ID and Date

I am looking to remove duplicates, but not just by date, but by vendor ID. This will mean that each most recent occurance of a barcode, from each vendor, will load. Looking at the below exmaple, I should be left with the two rows at the bottom showing the latest occurances of each barcode from each vendor. In my data there are many vendors with the same barcode so I need to make sure it is not simply the latest occurance of each barcode, but the latest occurance of each barcode with each vendor.

 

Vendor IDBarcodeDate
44567788990901/03/2026
12126656556505/03/2026
44567788990905/03/2026
12126656556502/03/2026
12126656556501/03/2026
12126656556502/03/2026
44567788990902/03/2026

 

End result:

Vendor IDBarcodeDate
44567788990905/03/2026
12126656556505/03/2026

 

I have tried to copy my query, keeping only these three columns

= Table.Group(#"Expanded Items", {"Vendor ID", "Barcode"}, {{"Latest_Date", each List.Max([Price_Date]), type nullable date}})

 

Then, I've gone back to my original query and joined them, using all three columns, using an inner join (only matching rows). This hasn't worked well. It looks like it may have worked as intended with some vendor's but definately not all.

1 ACCEPTED SOLUTION

Thanks for the responce.

 

I struggled to replicate this as the steps take so long in Power Query to load. I manage to solve my issue as there was a column in my table for the latest price, true or false, for each product.

View solution in original post

7 REPLIES 7
Ashish_Mathur
Super User
Super User

Hi,

This M code works

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Grouped Rows" = Table.Group(Source, {"Vendor ID"}, {{"Count", each Table.Max(_,"Date")}}),
    #"Expanded Count" = Table.ExpandRecordColumn(#"Grouped Rows", "Count", {"Barcode", "Date"}, {"Barcode", "Date"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Count",{{"Vendor ID", type text}, {"Barcode", type text}, {"Date", type date}})
in
    #"Changed Type"

Hope this helps.

Ashish_Mathur_0-1773802559865.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
V-yubandi-msft
Community Support
Community Support

Hi @RossS ,

If the response shared by @danextian , @Natarajan_M , @Kedar_Pande  aligns with your expectations, please take a moment to review it and let us know if you need any additional details or clarifications.

 

Thank you all for your valuable support.

 

Regards,

Yugandhar.

Kedar_Pande
Super User
Super User

@RossS 

 

In Power Query (one step):

Sort by Date descending (newest first)

Add Index column

Buffer table: = Table.Buffer(#"Added Index")

Select Vendor ID + Barcode + Index → Remove Duplicates

Remove Index, expand other columns from original

Natarajan_M
Solution Sage
Solution Sage

Hi @RossS ,

I'm able to mimic your scenario.

If your data volume is low, you can use the Table.Buffer option. However, if your volume is high, using Table.Buffer will increase the query processing overhead. Here are the reasons why:

1. It loads all the data into memory before processing.
2. It breaks query folding.
3. It is ideal for small lookup tables but not for fact tables (which I assume is the case with your transaction fact).

A solution I would prefer is to use the following approach: sort, then group, and finally take the first record.


Query:

let
    Source = Excel.Workbook(File.Contents("C:\Users\natar\Downloads\Community\Dedup_VendorBarcode.xlsx"), null, true),
    RawData_Sheet = Source{[Item="RawData",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(RawData_Sheet, [PromoteAllScalars=true]),

    #"Typed" = Table.TransformColumnTypes(#"Promoted Headers", {
        {"VendorID", Int64.Type}, {"Barcode", Int64.Type},
        {"Date", type date}, {"Price", type number}, {"Description", type text}
    }),

    #"Sorted" = Table.Sort(#"Typed",{{"Date", Order.Descending}}),

    #"Grouped" =
        Table.Group(
            #"Sorted",
            {"VendorID","Barcode"},
            {{"Latest", each Table.First(_), type record}}
        ),

    #"Expanded" =
        Table.ExpandRecordColumn(
            #"Grouped",
            "Latest",
            {"Date","Price","Description"}
        )
in
    #"Expanded"



Raw data 

Natarajan_M_0-1773234398594.png

 

Transformed data :

Natarajan_M_1-1773234463533.png

PBIX : Latest data.pbix 

Thanks 
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster

 

 

Thanks for the responce.

 

I struggled to replicate this as the steps take so long in Power Query to load. I manage to solve my issue as there was a column in my table for the latest price, true or false, for each product.

Thanks for the update. Good you figured it out . Seems like the issue was due to the latest price  true/false column.

 

Feel free to reach out if you need any further help.

danextian
Super User
Super User

Hi @RossS 

Sort your data by date in descending order. Add a custom applied step with Table.Buffer to store the sorted data into memory. Select Vendor ID column and then remove duplicates.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjExNVPSUTI3t7CwtDSwBDINDPUNjPWNDIzMlGJ1opUMjQyNgKJmZqZmpkAEUmCKogCLCaaETDAipICgG4wIuQFJQSwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Vendor ID" = _t, Barcode = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Vendor ID", Int64.Type}, {"Barcode", Int64.Type}, {"Date", type date}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Date", Order.Descending}}),
    Table.Buffer = Table.Buffer(#"Sorted Rows"),
    #"Removed Duplicates" = Table.Distinct(Table.Buffer, {"Vendor ID"})
in
    #"Removed Duplicates"

danextian_0-1773221606251.png

 





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Power BI Update Carousel

Power BI Community Update - March 2026

Check out the March 2026 Power BI update to learn about new features.