Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
Centaur1
Regular Visitor

Add Conditional Column (tag with 0 or 1)

Hello,

 

I am a novice user of PQ. 

I am deleting duplicates and I know PQ keeps the first instance and deletes the 2nd.

What I need to do is keep the 2nd.  

Not sure exactly how to do this but I gather from reading other posts, one way to do this is to tag the rows with a 0 or a 1.

 

I think I need to add the procedure just prior to the Delete Duplicates. 

 

How would I add a conditional column (i guess that is what it is) and tagging the rows with a 0 or a 1 so I can delete the one I need.   

Not sure if impt but I only have a few hundred duplicates out of nearly 10k rows so I think the rows that are not duplicate will be NULL. 

 

Condition:  if "Invoice #" and "Invoice amount" are equal then put a 0 on the first record and a 1 on the 2nd.  

 

thank you. 

Please let me know if not clear. 

Grateful for the help. 

 

here is my entire code:

(I dont think its necessary to paste it all s9nce I think you only need to reference the one above "Removed Duplicates"..ie Upercased Text.  

 

let
Source = Table.Combine({#"FBL1N - 922A (no filter)_Link", #"Project Costs"}),
#"Uppercased Text" = Table.TransformColumns(Source,{{"Vendor", Text.Upper, type text}}),
#"Removed Duplicates" = Table.Distinct(#"Uppercased Text", {"Invoice #", "Invoice amount"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Duplicates",{"Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "USD Amount", "Budget Category", "Account", "TypeDraw"}),
#"Added Conditional Column" = Table.AddColumn(
#"Reordered Columns",
"Custom",
each
if [WDDate] = null then [Funding Date]
else if [WDDate] >= [Funding Date] then [WDDate]
else [Funding Date]
),
#"Reordered Columns1" = Table.ReorderColumns(#"Added Conditional Column",{"Custom", "Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "USD Amount", "Budget Category", "Account", "TypeDraw"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns1",{{"Custom", "Use This Date"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Use This Date", type date}}),
#"Reordered Columns2" = Table.ReorderColumns(#"Changed Type",{"Use This Date", "Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "USD Amount", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "Budget Category", "Account", "TypeDraw"}),
Custom1 = Table.AddColumn(#"Reordered Columns2", "Custom", each if [USD Amount] = null then [Invoice amount] else if [USD Amount] >= [Invoice amount] then [USD Amount] else [Invoice amount]),
#"Reordered Columns3" = Table.ReorderColumns(Custom1,{"Use This Date", "Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "USD Amount", "Custom", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "Budget Category", "Account", "TypeDraw"}),
#"Renamed Columns1" = Table.RenameColumns(#"Reordered Columns3",{{"Custom", "Amount Use This"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns1",{{"Amount Use This", Currency.Type}}),
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Use This Date", Order.Ascending}, {"Vendor", Order.Ascending}, {"Amount Use This", Order.Descending}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Sorted Rows",{{"Amount Use This", type number}})
in
#"Changed Type2"

 

 

1 ACCEPTED SOLUTION

1.) For future requests - allways provide sample data and expected resuld based on that!

2.) If you don't know how to do it - read note below my post.

3.) If you don't know how to use my query also read note below my post.

 

New sample: Removes duplicates in [Column1] and [Column2] and keep rows where there are no duplicates.

 

Before

dufoq3_0-1737887341849.png

 

After

dufoq3_1-1737887352900.png

 

let
    Source = Table.FromRows({{"A", 10, 1}, {"A", 10, 2}, {"B", 50, 5}, {"B", 50, 6}, {"C", 600, 8}, {"C", 600, 9}}),
    RemovedFirstDuplicate = Table.Combine(Table.Group(Source, {"Column1", "Column2"}, {{"T", each if Table.RowCount(_) = 1 then _ else Table.Skip(_), type table}})[T])
in
    RemovedFirstDuplicate

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

13 REPLIES 13
dufoq3
Super User
Super User

@Centaur1, check this:

Before

dufoq3_0-1737885901802.png

 

After

dufoq3_1-1737885921737.png

let
    Source = Table.FromRows({{"A", 10}, {"A", 20}, {"B", 50}, {"B", 60}, {"C", 500}, {"C", 600}}),
    RemovedFirstDuplicate = Table.Combine(Table.Group(Source, {"Column1"}, {{"T", each Table.Skip(_), type table}})[T])
in
    RemovedFirstDuplicate

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Hi Dufoq3, I looked over the sample and I think it's not a duplicate because it needs to be on both the company name and the amount.  I think the duplicate in the example is only on the company name, but I might not be understanding.

 

just to be clear, there are cases where there are not duplicates.

 

I would need some assistance on exactly where to place that code as well. I really don't know where it should be.

1.) For future requests - allways provide sample data and expected resuld based on that!

2.) If you don't know how to do it - read note below my post.

3.) If you don't know how to use my query also read note below my post.

 

New sample: Removes duplicates in [Column1] and [Column2] and keep rows where there are no duplicates.

 

Before

dufoq3_0-1737887341849.png

 

After

dufoq3_1-1737887352900.png

 

let
    Source = Table.FromRows({{"A", 10, 1}, {"A", 10, 2}, {"B", 50, 5}, {"B", 50, 6}, {"C", 600, 8}, {"C", 600, 9}}),
    RemovedFirstDuplicate = Table.Combine(Table.Group(Source, {"Column1", "Column2"}, {{"T", each if Table.RowCount(_) = 1 then _ else Table.Skip(_), type table}})[T])
in
    RemovedFirstDuplicate

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

thank you dufoq3.  Very nice.  I might need to ask a follow up but will ask another question since I am not sure what I need at the moment, if anything.  I have to look over the data since I took a different path.  

You're welcome.


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Hi, thank you for the revised.  That is amazing.  

I did click the link in the note however it is giving me this message:

Centaur1_0-1737889235858.png

 

Try it now - it should work:

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

dufoq3
Super User
Super User

Hi @Centaur1,

 

"I am deleting duplicates and I know PQ keeps the first instance and deletes the 2nd"

This may not allways be true. It is true if you use Table.Buffer in previous step.


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

PwerQueryKees
Super User
Super User

Start by adding an index column:

I started with this:

PwerQueryKees_1-1737796708946.png

Then added an index column:

PwerQueryKees_0-1737796575438.png

PwerQueryKees_2-1737796788889.png

 

Then add another colum to show if it is odd or even:

PwerQueryKees_3-1737796968708.png

 

PwerQueryKees_4-1737796999971.png

Then filter only the 0 if you only want to keep the first.

Start with a right-click on any of the cells containg 0:

PwerQueryKees_5-1737797200010.png

PwerQueryKees_6-1737797279087.png

I leave removing the additional columns to you 😊

 

Here the full M Code. Replace the first 2 steps with your own ...

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclSK1YlWSgKTyWAyBUymgsk0MJmuFBsLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each Number.Mod([Index],2)),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Custom] = 0)
in
    #"Filtered Rows"


Did I answer your question? Then please mark my post as the solution and make it easier to find for others having a similar problem.
If I helped you, please click on the Thumbs Up to give Kudos.

Kees Stolker

A big fan of Power Query and Excel

 

 

HI Kees, thank you for your response as well.  I do not know exactly where to place those codes.  If could kindly read my response to DataNinja since it basically applies. My knowledge of PQ is pretty limited.  thank you again for your response.  

And if you start at the point where you want to remove the duplicates and follow my step by step instructions?That seems to me the best approach for you because you can check every step by itself and keep the coding to a minimum.

DataNinja777
Super User
Super User

Hi @Centaur1 ,

 

To keep the second instance of duplicates in Power Query, you can modify your query to tag each row within a duplicate group and filter based on the tag. Here's how you can do it.

First, ensure your table is sorted so duplicates are grouped together. You can use the Table.Sort function to sort by Invoice # and Invoice amount. Then, add an index column to uniquely identify each row. After that, group the table by Invoice # and Invoice amount. Within each group, add a new index column to tag rows with a "row number." This step helps identify the first, second, and subsequent duplicates. Finally, filter the table to keep only the rows with a "row number" of 2, corresponding to the second instance of each duplicate. Once done, you can clean up by removing unnecessary columns.

Here's the modified query:

let
    Source = Table.Combine({#"FBL1N - 922A (no filter)_Link", #"Project Costs"}),
    SortedRows = Table.Sort(Source, {{"Invoice #", Order.Ascending}, {"Invoice amount", Order.Ascending}}),
    AddedIndex = Table.AddIndexColumn(SortedRows, "Index", 1, 1, Int64.Type),
    GroupedRows = Table.Group(
        AddedIndex,
        {"Invoice #", "Invoice amount"},
        {
            {"All Data", each 
                let 
                    AddSubIndex = Table.AddIndexColumn(_, "Row Number", 1, 1, Int64.Type) 
                in 
                    AddSubIndex, type table}
        }
    ),
    ExpandedRows = Table.ExpandTableColumn(GroupedRows, "All Data", {"Index", "Row Number", "Vendor", "Funding Date", "WDDate", "Company Code", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "USD Amount", "Budget Category", "Account", "TypeDraw"}),
    FilteredRows = Table.SelectRows(ExpandedRows, each [Row Number] = 2),
    RemovedUnnecessaryColumns = Table.RemoveColumns(FilteredRows, {"Row Number"})
in
    RemovedUnnecessaryColumns

 

This approach ensures you retain only the second occurrence of each duplicate based on Invoice # and Invoice amount. Let me know if you have any questions or need further assistance!

 

Best regards,

HI DataNinja, thank you for the response.  

I am very novice to PQ. 

 

 sorry for my limited knowledge but I am not sure where I would place that code. 

I assume that I would need to remove the "REMOVED DUPLICATES" line (#2 in the pic below)

Not sure if it matters but not all records are duplicates and not sure if the code takes that into account. 

 

here is a pic of my code (25 lines).  (the codes in text are in my first post if you happen to need them)

Centaur1_2-1737817387453.pngCentaur1_3-1737817424394.png

 

thank you very much.  

Looking forward to your response.

 

 

 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.