Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hello,
Is there a straightforward way to identify/flag duplicate column records in Power Query?
When I say straightforward, I mean something that doesn't involve aggregations like this method for example, because I don't want to aggregate my table or add too many steps.
I'm looking for the Power Query equivalent of this DAX code.
Solved! Go to Solution.
Hi @Quiny_Harl, don't be affraid of using Group By - it is one of my favourite functions.
You can achieve what you need in single step.
This code will preserve all your exicting columns and add new [Duplicate Flag] at the end.
Add this code as new step, but replace Source with your previous step reference and "PK" with your column name where you want to check for duplicates
Table.Combine(Table.Group(Source, {"PK"}, {{"All", each
Table.AddColumn(_, "Duplicate Flag", (x)=> Table.RowCount(_), Int64.Type), type table}})[All])
Hi @Quiny_Harl, don't be affraid of using Group By - it is one of my favourite functions.
You can achieve what you need in single step.
This code will preserve all your exicting columns and add new [Duplicate Flag] at the end.
Add this code as new step, but replace Source with your previous step reference and "PK" with your column name where you want to check for duplicates
Table.Combine(Table.Group(Source, {"PK"}, {{"All", each
Table.AddColumn(_, "Duplicate Flag", (x)=> Table.RowCount(_), Int64.Type), type table}})[All])
Not so, you can right-click on a single column or multiple columns, and select "Remove duplicates". It will remove duplicates in the field(s) you selected, regardless of what else is in the row.
--Nate
please define "duplicate" and "identify". Does that include the first occurrence? If yes then you can do a table self join.
No way to do it without aggregating, if you are trying to keep the duplicates and mark them somehow. Just group, add a count column, add the All Rows column, then expand the table column. All of your counts above 1 are duplicates.
Not sure what else you might have envisioned.
--Nate
@Anonymous, what I've envisioned is something similar to the DAX code in the link I provided. I would like to create a custom column that is going to flag each records of a Column1 that apears more than once. Then, I'm going to filter out the duplicates.
You can use implicit measures for that (Count and Count(Distinct)). Please define what you mean by "filter out the duplicates".
Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information or anything not related to the issue or question.
If you are unsure how to upload data please refer to https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Please show the expected outcome based on the sample data you provided.
Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...
I cannot use Count and Count(Distinct) in Power Query.
I'm not sure how to further clarify what I need as I think it is pretty obvious.
PK | Duplicate Flag |
1 | 1 |
2 | 1 |
3 | 2 |
3 | 2 |
4 | 1 |
I need to add the Duplicate Flag column in Power Query. Each record that has a count more than once is a duplicate. I'm going to apply a filter on the Duplicate Flag to remove these records.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgKTxkikiVJsLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [PK = _t]),
#"Added Custom" = Table.AddColumn(Source, "Duplicate Flag", (k)=> List.Count(List.Select(Source[PK], each _ = k[PK] )))
in
#"Added Custom"
You can throw a List.Buffer into the mix if you need better performance.
Thank you for trying to help me but I don't understand how to use this code.
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done". Once you examined the code, replace the Source step with your own source.
If you are going to filter out the duplicates anyway, why not just click "Remove Duplicates"?
--Nate
@Anonymous, because this works only when the whole table row, including all columns, is a duplicate.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
15 | |
14 | |
13 | |
12 | |
11 |