Reply
Andrew__K
Frequent Visitor
Partially syndicated - Outbound

Remove duplicate row, as well as row that was duplicated.

Our labelling software generates a transaction report that represents each label generated, I pull data from this report using PowerQuery. Each row is an individual label or transaction.

 

When an error is made during the labelling process and the operator deletes/reverses that label, the transaction report shows it as a normal transaction with a 1 in a Transaction Type (Trx Type) column, followed by another row with the same information, but has a -1 as the transaction type to represent the deleted label.

 

Is there any way I can remove both these rows in my query? I know I can filter -1 Trx Types, but this leaves the original transaction in place. These is also a serial column for each transaction, these match on the -1 and 1 rows, so could potentially be used? Below is an example of the report.

 

Date Product PPN Description Prod Date SerialTrx Type
4/12/202310751PRODUCT LABEL2/11/2023700011
4/12/202310751PRODUCT LABEL2/11/2023700021
4/12/202310751PRODUCT LABEL2/11/2023700031
4/12/202310751PRODUCT LABEL2/11/2023700041
4/12/202310751PRODUCT LABEL2/11/2023700051
4/12/202310751PRODUCT LABEL2/11/2023700061
4/12/202310751PRODUCT LABEL2/11/2023700071
4/12/202310751PRODUCT LABEL2/11/2023700081
4/12/202310751PRODUCT LABEL2/11/2023700091
4/12/202310751PRODUCT LABEL2/11/2023700101
4/12/202310751PRODUCT LABEL2/11/2023700111
4/12/202310751PRODUCT LABEL (*Deleted*)2/11/202370011-1
4/12/202310751PRODUCT LABEL2/11/2023700121
4/12/202310751PRODUCT LABEL2/11/2023700131

 

Any help would be appreciated.

 

Thank you

2 ACCEPTED SOLUTIONS
adudani
Super User
Super User

Syndicated - Outbound

HI @Andrew__K ,

 

Steps taken:

1. Unique Row Identifier: Combination of all columns delimeted by "|" in the Table Query (Your input)

2. Created a new query (RowsToRemove) referencing this

3. Grouped this query by Serial along with a summarized table of all rows and a count of all distinct rows. If distinct count > 1 then there are duplicates.

4. Filtered for rows >1 . Expanded the table.

5. Merged Table and Rows to delete as a new query on the unique identifier (Output - shown below). The rows which don't have a unique idenitifer are the rows which didn't have any duplicates in the raw data and are retained.

adudani_0-1701667600310.png

 

Copy and paste the following queries in the advanced editor of a blank query:

 

Table:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pdKxDoIwEAbgV7l0UoKhV0B0VOtGlChODQOBDiRESK2Jvr1tIqOD3NY/93+XDqcUk7XVLGRQmKF9NhaK4gQuS/1oTDfabrj76KcwVa/adHXvXqV5QfkeNatCxZIIRSS4iN0AeZaiZ5ezvB1KyHf7Y+6yiBCnTsY59x2crQVJxySdkHRK0muSzkh6Q9JbikZO0n/cGiwCqXttdRssf2xazf8I6Wzxe7bVBw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"(blank)" = _t, #"(blank).1" = _t, #"(blank).2" = _t, #"(blank).3" = _t, #"(blank).4" = _t, #"(blank).5" = _t]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {" Product PPN ", Int64.Type}, {"Description ", type text}, {"Prod Date", type date}, {" Serial", Int64.Type}, {"Trx Type", Int64.Type}}),
    #"Unique Row Identifier" = Table.AddColumn(#"Changed Type", "Unique Row Identifier", each Text.Combine({Text.From([Date], "en-CA"), Text.From([#" Product PPN "], "en-CA"), [#"Description "], Text.From([Prod Date], "en-CA"), Text.From([#" Serial"], "en-CA"), Text.From([Trx Type], "en-CA")}, "|"), type text)
in
    #"Unique Row Identifier"

RowstoRemove:

let
    Source = Table,
    #"Grouped Rows" = Table.Group(Source, {" Serial"}, {{"Data", each _, type table [Date=nullable date, #" Product PPN "=nullable number, #"Description "=nullable text, Prod Date=nullable date, #" Serial"=nullable number, Trx Type=nullable number, Unique Row Identifier=text]}, {"Distinct Row Count", each Table.RowCount(Table.Distinct(_)), Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Distinct Row Count] = 2)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Date", " Product PPN ", "Description ", "Prod Date", " Serial", "Trx Type", "Unique Row Identifier"}, {"Date", " Product PPN ", "Description ", "Prod Date", " Serial", "Trx Type", "Unique Row Identifier"})
in
    #"Expanded Data"

Output:

let
    Source = Table.NestedJoin(Table, {"Unique Row Identifier"}, RowsToRemove, {"Unique Row Identifier"}, "RowsToRemove", JoinKind.LeftOuter),
    #"Expanded RowsToRemove" = Table.ExpandTableColumn(Source, "RowsToRemove", {"Unique Row Identifier"}, {"Unique Row Identifier.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded RowsToRemove", each ([Unique Row Identifier.1] = null))
in
    #"Filtered Rows"

 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a Kudos 🙂
Kind Regards,
Avinash

View solution in original post

Syndicated - Outbound

thank you! Using your suggestions as well as others, I ended up grouping a merge of PPN-DATE-SERIAL with a row count and all rows column, filtering >1 on the row count and then expanding the all rows column. 

View solution in original post

5 REPLIES 5
avatar user
Anonymous
Not applicable

Syndicated - Outbound

Easier to just Table.ReplaceValue "-" with "", then Keep Duplicates, then Table.Join the step with the duplicates to the last step, using JoinKind.LeftAnti. So

 

RemoveValue = Table.ReplaceValue(PriorStepOrTableName, //whatever the code is to replace "-" with "" in the Serial Column,

KeepDups = //Use the Keep Duplicates GUI function,

RenameSerial = //Rename Serial to Serial1,

DeleteDups = TableJoin(KeepDups,, {"Serial"},  RenameSerial, {"Serial1}, JoinKind.LeftAnti)

 

--Nate

AlienSx
Super User
Super User

Syndicated - Outbound

Hello, @Andrew__K 

    s = your_table,
    deleted = List.Buffer(Table.SelectRows(s, each [Trx Type] = - 1)[Serial]),
    filtered = Table.SelectRows(s, each not List.Contains(deleted, [Serial]))
spinfuzer
Super User
Super User

Syndicated - Outbound

Hi @Andrew__K ,

You could add a custom column "neg Trx Type" and multiply Trx Type by -1.

 

Next do a self left anti join on Serial and Neg Trx Type against Serial and Trx Type  to remove offsetting rows.

 

spinfuzer_0-1701671115950.png

 

adudani
Super User
Super User

Syndicated - Outbound

HI @Andrew__K ,

 

Steps taken:

1. Unique Row Identifier: Combination of all columns delimeted by "|" in the Table Query (Your input)

2. Created a new query (RowsToRemove) referencing this

3. Grouped this query by Serial along with a summarized table of all rows and a count of all distinct rows. If distinct count > 1 then there are duplicates.

4. Filtered for rows >1 . Expanded the table.

5. Merged Table and Rows to delete as a new query on the unique identifier (Output - shown below). The rows which don't have a unique idenitifer are the rows which didn't have any duplicates in the raw data and are retained.

adudani_0-1701667600310.png

 

Copy and paste the following queries in the advanced editor of a blank query:

 

Table:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pdKxDoIwEAbgV7l0UoKhV0B0VOtGlChODQOBDiRESK2Jvr1tIqOD3NY/93+XDqcUk7XVLGRQmKF9NhaK4gQuS/1oTDfabrj76KcwVa/adHXvXqV5QfkeNatCxZIIRSS4iN0AeZaiZ5ezvB1KyHf7Y+6yiBCnTsY59x2crQVJxySdkHRK0muSzkh6Q9JbikZO0n/cGiwCqXttdRssf2xazf8I6Wzxe7bVBw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"(blank)" = _t, #"(blank).1" = _t, #"(blank).2" = _t, #"(blank).3" = _t, #"(blank).4" = _t, #"(blank).5" = _t]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {" Product PPN ", Int64.Type}, {"Description ", type text}, {"Prod Date", type date}, {" Serial", Int64.Type}, {"Trx Type", Int64.Type}}),
    #"Unique Row Identifier" = Table.AddColumn(#"Changed Type", "Unique Row Identifier", each Text.Combine({Text.From([Date], "en-CA"), Text.From([#" Product PPN "], "en-CA"), [#"Description "], Text.From([Prod Date], "en-CA"), Text.From([#" Serial"], "en-CA"), Text.From([Trx Type], "en-CA")}, "|"), type text)
in
    #"Unique Row Identifier"

RowstoRemove:

let
    Source = Table,
    #"Grouped Rows" = Table.Group(Source, {" Serial"}, {{"Data", each _, type table [Date=nullable date, #" Product PPN "=nullable number, #"Description "=nullable text, Prod Date=nullable date, #" Serial"=nullable number, Trx Type=nullable number, Unique Row Identifier=text]}, {"Distinct Row Count", each Table.RowCount(Table.Distinct(_)), Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Distinct Row Count] = 2)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Date", " Product PPN ", "Description ", "Prod Date", " Serial", "Trx Type", "Unique Row Identifier"}, {"Date", " Product PPN ", "Description ", "Prod Date", " Serial", "Trx Type", "Unique Row Identifier"})
in
    #"Expanded Data"

Output:

let
    Source = Table.NestedJoin(Table, {"Unique Row Identifier"}, RowsToRemove, {"Unique Row Identifier"}, "RowsToRemove", JoinKind.LeftOuter),
    #"Expanded RowsToRemove" = Table.ExpandTableColumn(Source, "RowsToRemove", {"Unique Row Identifier"}, {"Unique Row Identifier.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded RowsToRemove", each ([Unique Row Identifier.1] = null))
in
    #"Filtered Rows"

 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a Kudos 🙂
Kind Regards,
Avinash

Syndicated - Outbound

thank you! Using your suggestions as well as others, I ended up grouping a merge of PPN-DATE-SERIAL with a row count and all rows column, filtering >1 on the row count and then expanding the all rows column. 

avatar user

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)