Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now
I'm looking to remove duplicates in a table, but I want to make sure that the rows that I am keeping are the ones that I actually want.
For example, I have something like the following where I have an acccount ID and two employees assigned to the account
| ID | Employee 1 | Employee 2 |
| 123 | Joe | Eddie |
| 456 | Frank | Felicia |
| 789 | Monica | Betty |
| 789 | Monica | Betty |
| 789 | Hannah | Brett |
| 123 | Joe | Hank |
I'm under the impression that if I remove duplicates within Power Query, it will always keep the last instance. I am also under the assumption that whenever I append new data to Power BI, the most recent data is added to the bottom of each table. From the example above, I would want to keep the last instance of each of the ID columns as they contain the most up-to-date employee assignments. Essentially, I want to be certain that once I drop the duplicates of ID, I always return the below
| ID | Employee 1 | Employee 2 |
| 456 | Frank | Felicia |
| 789 | Hannah | Brett |
123 | Joe | Hank |
I want to be certain that everything is acting as I expect and I am presenting the most relevant and up-to-date data.
Is someone able to confirm that the drop duplicates function will always act this way?
Solved! Go to Solution.
Hi @markmess77,
You can refer to the following sample code to achieve your requirement:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtJR8spPBZKuKSmZqUqxOtFKJqZmQL5bUWJeNohOzclMzkwEy5hbWAJFfPPzMpMTgQyn1JKSSmIkPBLz8hIzQBJFQBmwBLLVHiCbYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, #"Employee 1" = _t, #"Employee 2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Employee 1", type text}, {"Employee 2", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Count", each Table.LastN(_,1), type table [ID= number, Employee 1=text, Employee 2=text]}}),
#"Expanded Count" = Table.ExpandTableColumn(#"Grouped Rows", "Count", {"Employee 1", "Employee 2"}, {"Employee 1", "Employee 2"})
in
#"Expanded Count"
Logic:
1. Group table rows based on ID fields.
2. Keep the last row of each grouped tables.
3. Expand filtered group table records.
Regards,
Xiaoxin Sheng
Hi,
This M code works
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Employee 1", type text}, {"Employee 2", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Grouped Rows" = Table.Group(#"Added Index", {"ID"}, {{"Max of index", each List.Max([Index]), type number}}),
Joined = Table.Join(#"Added Index", "ID", #"Grouped Rows", "ID"),
#"Added Custom" = Table.AddColumn(Joined, "Test", each if [Index]=[Max of index] then "Keep" else "Discard"),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Test] = "Keep")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Max of index", "Test"})
in
#"Removed Columns"
Hope this helps.
Hi,
This M code works
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Employee 1", type text}, {"Employee 2", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Grouped Rows" = Table.Group(#"Added Index", {"ID"}, {{"Max of index", each List.Max([Index]), type number}}),
Joined = Table.Join(#"Added Index", "ID", #"Grouped Rows", "ID"),
#"Added Custom" = Table.AddColumn(Joined, "Test", each if [Index]=[Max of index] then "Keep" else "Discard"),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Test] = "Keep")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Max of index", "Test"})
in
#"Removed Columns"
Hope this helps.
Hi @markmess77,
You can refer to the following sample code to achieve your requirement:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtJR8spPBZKuKSmZqUqxOtFKJqZmQL5bUWJeNohOzclMzkwEy5hbWAJFfPPzMpMTgQyn1JKSSmIkPBLz8hIzQBJFQBmwBLLVHiCbYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, #"Employee 1" = _t, #"Employee 2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Employee 1", type text}, {"Employee 2", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Count", each Table.LastN(_,1), type table [ID= number, Employee 1=text, Employee 2=text]}}),
#"Expanded Count" = Table.ExpandTableColumn(#"Grouped Rows", "Count", {"Employee 1", "Employee 2"}, {"Employee 1", "Employee 2"})
in
#"Expanded Count"
Logic:
1. Group table rows based on ID fields.
2. Keep the last row of each grouped tables.
3. Expand filtered group table records.
Regards,
Xiaoxin Sheng
Hi @markmess77 ,
Try this code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQyVtJR8spPBZKuKSmZqUqxOtFKJqZmQL5bUWJeNohOzclMzkwEy5hbWAJFfPPzMpMTgQyn1JKSSmIkPBLz8hIzQBJFQBmwBLLVHiCbYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, #"Employee 1" = _t, #"Employee 2" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Employee 1", type text}, {"Employee 2", type text}}),
Group = Table.Group(#"Changed Type", {"ID"}, {{"Rows", each Table.AddIndexColumn(_, "Index", 1), type table}}),
#"Expanded Rows" = Table.ExpandTableColumn(Group, "Rows", {"Employee 1", "Employee 2", "Index"}, {"Employee 1", "Employee 2", "Index"}),
#"Added Custom" = Table.AddColumn(#"Expanded Rows", "Custom", each Table.Group(#"Expanded Rows", {"ID"}, {{"Max", each List.Max([Index]), type number}})),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"ID", "Max"}, {"ID.1", "Max"}),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom", each if [ID] = [ID.1] and [Index] = [Max] then 1 else 0),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "ID.1", "Max"})
in
#"Removed Columns"
It basically creates an index for each ID and filter to the max value for each ID.
I hope it helps you,
Ricardo
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Power BI update to learn about new features.