Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello,
I am a novice user of PQ.
I am deleting duplicates and I know PQ keeps the first instance and deletes the 2nd.
What I need to do is keep the 2nd.
Not sure exactly how to do this but I gather from reading other posts, one way to do this is to tag the rows with a 0 or a 1.
I think I need to add the procedure just prior to the Delete Duplicates.
How would I add a conditional column (i guess that is what it is) and tagging the rows with a 0 or a 1 so I can delete the one I need.
Not sure if impt but I only have a few hundred duplicates out of nearly 10k rows so I think the rows that are not duplicate will be NULL.
Condition: if "Invoice #" and "Invoice amount" are equal then put a 0 on the first record and a 1 on the 2nd.
thank you.
Please let me know if not clear.
Grateful for the help.
here is my entire code:
(I dont think its necessary to paste it all s9nce I think you only need to reference the one above "Removed Duplicates"..ie Upercased Text.
let
Source = Table.Combine({#"FBL1N - 922A (no filter)_Link", #"Project Costs"}),
#"Uppercased Text" = Table.TransformColumns(Source,{{"Vendor", Text.Upper, type text}}),
#"Removed Duplicates" = Table.Distinct(#"Uppercased Text", {"Invoice #", "Invoice amount"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Duplicates",{"Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "USD Amount", "Budget Category", "Account", "TypeDraw"}),
#"Added Conditional Column" = Table.AddColumn(
#"Reordered Columns",
"Custom",
each
if [WDDate] = null then [Funding Date]
else if [WDDate] >= [Funding Date] then [WDDate]
else [Funding Date]
),
#"Reordered Columns1" = Table.ReorderColumns(#"Added Conditional Column",{"Custom", "Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "USD Amount", "Budget Category", "Account", "TypeDraw"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns1",{{"Custom", "Use This Date"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Use This Date", type date}}),
#"Reordered Columns2" = Table.ReorderColumns(#"Changed Type",{"Use This Date", "Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "USD Amount", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "Budget Category", "Account", "TypeDraw"}),
Custom1 = Table.AddColumn(#"Reordered Columns2", "Custom", each if [USD Amount] = null then [Invoice amount] else if [USD Amount] >= [Invoice amount] then [USD Amount] else [Invoice amount]),
#"Reordered Columns3" = Table.ReorderColumns(Custom1,{"Use This Date", "Funding Date", "WDDate", "Company Code", "Vendor", "Invoice #", "Invoice amount", "USD Amount", "Custom", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "Budget Category", "Account", "TypeDraw"}),
#"Renamed Columns1" = Table.RenameColumns(#"Reordered Columns3",{{"Custom", "Amount Use This"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns1",{{"Amount Use This", Currency.Type}}),
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Use This Date", Order.Ascending}, {"Vendor", Order.Ascending}, {"Amount Use This", Order.Descending}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Sorted Rows",{{"Amount Use This", type number}})
in
#"Changed Type2"
Solved! Go to Solution.
1.) For future requests - allways provide sample data and expected resuld based on that!
2.) If you don't know how to do it - read note below my post.
3.) If you don't know how to use my query also read note below my post.
New sample: Removes duplicates in [Column1] and [Column2] and keep rows where there are no duplicates.
Before
After
let
Source = Table.FromRows({{"A", 10, 1}, {"A", 10, 2}, {"B", 50, 5}, {"B", 50, 6}, {"C", 600, 8}, {"C", 600, 9}}),
RemovedFirstDuplicate = Table.Combine(Table.Group(Source, {"Column1", "Column2"}, {{"T", each if Table.RowCount(_) = 1 then _ else Table.Skip(_), type table}})[T])
in
RemovedFirstDuplicate
@Centaur1, check this:
Before
After
let
Source = Table.FromRows({{"A", 10}, {"A", 20}, {"B", 50}, {"B", 60}, {"C", 500}, {"C", 600}}),
RemovedFirstDuplicate = Table.Combine(Table.Group(Source, {"Column1"}, {{"T", each Table.Skip(_), type table}})[T])
in
RemovedFirstDuplicate
Hi Dufoq3, I looked over the sample and I think it's not a duplicate because it needs to be on both the company name and the amount. I think the duplicate in the example is only on the company name, but I might not be understanding.
just to be clear, there are cases where there are not duplicates.
I would need some assistance on exactly where to place that code as well. I really don't know where it should be.
1.) For future requests - allways provide sample data and expected resuld based on that!
2.) If you don't know how to do it - read note below my post.
3.) If you don't know how to use my query also read note below my post.
New sample: Removes duplicates in [Column1] and [Column2] and keep rows where there are no duplicates.
Before
After
let
Source = Table.FromRows({{"A", 10, 1}, {"A", 10, 2}, {"B", 50, 5}, {"B", 50, 6}, {"C", 600, 8}, {"C", 600, 9}}),
RemovedFirstDuplicate = Table.Combine(Table.Group(Source, {"Column1", "Column2"}, {{"T", each if Table.RowCount(_) = 1 then _ else Table.Skip(_), type table}})[T])
in
RemovedFirstDuplicate
thank you dufoq3. Very nice. I might need to ask a follow up but will ask another question since I am not sure what I need at the moment, if anything. I have to look over the data since I took a different path.
Hi, thank you for the revised. That is amazing.
I did click the link in the note however it is giving me this message:
Start by adding an index column:
I started with this:
Then added an index column:
Then add another colum to show if it is odd or even:
Then filter only the 0 if you only want to keep the first.
Start with a right-click on any of the cells containg 0:
I leave removing the additional columns to you 😊
Here the full M Code. Replace the first 2 steps with your own ...
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclSK1YlWSgKTyWAyBUymgsk0MJmuFBsLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each Number.Mod([Index],2)),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Custom] = 0)
in
#"Filtered Rows"
Did I answer your question? Then please mark my post as the solution and make it easier to find for others having a similar problem.
If I helped you, please click on the Thumbs Up to give Kudos.
Kees Stolker
A big fan of Power Query and Excel
HI Kees, thank you for your response as well. I do not know exactly where to place those codes. If could kindly read my response to DataNinja since it basically applies. My knowledge of PQ is pretty limited. thank you again for your response.
And if you start at the point where you want to remove the duplicates and follow my step by step instructions?That seems to me the best approach for you because you can check every step by itself and keep the coding to a minimum.
Hi @Centaur1 ,
To keep the second instance of duplicates in Power Query, you can modify your query to tag each row within a duplicate group and filter based on the tag. Here's how you can do it.
First, ensure your table is sorted so duplicates are grouped together. You can use the Table.Sort function to sort by Invoice # and Invoice amount. Then, add an index column to uniquely identify each row. After that, group the table by Invoice # and Invoice amount. Within each group, add a new index column to tag rows with a "row number." This step helps identify the first, second, and subsequent duplicates. Finally, filter the table to keep only the rows with a "row number" of 2, corresponding to the second instance of each duplicate. Once done, you can clean up by removing unnecessary columns.
Here's the modified query:
let
Source = Table.Combine({#"FBL1N - 922A (no filter)_Link", #"Project Costs"}),
SortedRows = Table.Sort(Source, {{"Invoice #", Order.Ascending}, {"Invoice amount", Order.Ascending}}),
AddedIndex = Table.AddIndexColumn(SortedRows, "Index", 1, 1, Int64.Type),
GroupedRows = Table.Group(
AddedIndex,
{"Invoice #", "Invoice amount"},
{
{"All Data", each
let
AddSubIndex = Table.AddIndexColumn(_, "Row Number", 1, 1, Int64.Type)
in
AddSubIndex, type table}
}
),
ExpandedRows = Table.ExpandTableColumn(GroupedRows, "All Data", {"Index", "Row Number", "Vendor", "Funding Date", "WDDate", "Company Code", "Document Number", "Pay Yes/No", "Invoice date", "Posting Date", "Invoice due date", "Invoice No Stripped", "Payment Method", "Column16", "Vendor Code", "Payee/er", "Clearing Document", "Clearing date", "Text", "Part.bank type", "Payment Block", "Terms of Payment", "Sheet1", "DDNo", "Currency", "USD Amount", "Budget Category", "Account", "TypeDraw"}),
FilteredRows = Table.SelectRows(ExpandedRows, each [Row Number] = 2),
RemovedUnnecessaryColumns = Table.RemoveColumns(FilteredRows, {"Row Number"})
in
RemovedUnnecessaryColumns
This approach ensures you retain only the second occurrence of each duplicate based on Invoice # and Invoice amount. Let me know if you have any questions or need further assistance!
Best regards,
HI DataNinja, thank you for the response.
I am very novice to PQ.
sorry for my limited knowledge but I am not sure where I would place that code.
I assume that I would need to remove the "REMOVED DUPLICATES" line (#2 in the pic below)
Not sure if it matters but not all records are duplicates and not sure if the code takes that into account.
here is a pic of my code (25 lines). (the codes in text are in my first post if you happen to need them)
thank you very much.
Looking forward to your response.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
15 | |
11 | |
8 | |
8 | |
7 |
User | Count |
---|---|
14 | |
13 | |
9 | |
7 | |
6 |