Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello all!
I've found a few different solutions but none quite match what I'm looking for.
I have a column of data, in which every row *should* be a Unique ID (thousands of rows) BUT there are lots of duplicates:
IDENTIFIER
12345
54321
23451
35421
12345
54321
54321
I don't want to remove the duplicates as I need to count them as I'll be creating visuals from this data later - I would like to create a Custom Column that states "ERROR" for every duplicate value, and "OKAY" for Unique IDs:
IDENTIFIER Unique?
12345 ERROR
54321 ERROR
23451 OKAY
35421 OKAY
12345 ERROR
54321 ERROR
54321 ERROR
I have tried the following from a Stack solution:
if List.Count(List.Count(QUERY[IDENTIFIER],[IDENTIFIER])) > 1 then "ERROR" else "OKAY")
But I get varying errors, either cyclical references, something to do with type formatting (?) - but definitely not liking it either way!
I don't know if I'm missing something obvious (probably!) or I've misunderstood the Stack solution but just struggling to work it out.
Apologies once again for the lack of proper formatting. Hoping someone will be able to put me on the right track.
Solved! Go to Solution.
Hi @Franny_Spanny ,
Try this method.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMk4xT/RLMzeMcLQsszSMMPAwL7VIM1KK1SFZKsUYr1QsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Grouped Rows" = Table.Group(#"Added Index", {"Column1"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"index", each _, type table [Column1=nullable text, Index=number]}}),
#"Expanded index" = Table.ExpandTableColumn(#"Grouped Rows", "index", {"Column1", "Index"}, {"index.Column1", "index.Index"}),
#"Added Custom" = Table.AddColumn(#"Expanded index", "Custom", each if [Count] = 2 then "error" else null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Count", "index.Column1", "index.Index"})
in
#"Removed Columns"
https://excelguru.ca/identify-duplicates-using-power-query/ .
Best Regards,
Jay
Hi @Franny_Spanny ,
Try this method.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMk4xT/RLMzeMcLQsszSMMPAwL7VIM1KK1SFZKsUYr1QsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Grouped Rows" = Table.Group(#"Added Index", {"Column1"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"index", each _, type table [Column1=nullable text, Index=number]}}),
#"Expanded index" = Table.ExpandTableColumn(#"Grouped Rows", "index", {"Column1", "Index"}, {"index.Column1", "index.Index"}),
#"Added Custom" = Table.AddColumn(#"Expanded index", "Custom", each if [Count] = 2 then "error" else null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Count", "index.Column1", "index.Index"})
in
#"Removed Columns"
https://excelguru.ca/identify-duplicates-using-power-query/ .
Best Regards,
Jay
I managed to follow the instructions on the page you linked and it worked with some trial and error on my part! Thank you!
Hi @Franny_Spanny ,
Try the attached solution. The List.Text element is looking for just that, a text value. The identifier could be numeric and it's throwing the error.
Hope this helps.
Did I help you today? Please accept my solution and hit the Kudos button.
Hi @davehus - thanks for looking at this.
I tried changing the Date Type to text (which seems to be fine, no errors upon change), but I still get an error - "Expression.Error: A cyclic reference was encountered during evaluation."
Not sure if an actual data sample would help:
3d7aNf71XA9v91X0H7u8f2 |
3d7aNf71XA9v91X0H7u8f2 |
3d7aNf71XA9v91X0H7u8d3 |
3d7aNf71XA9v91X0H7u8d3 |
Not sure why mine doesn't want to work when your example clearly does!
Hi @Franny_Spanny ,
So what the calculation is doing is searching the text in the list against itself. What's happening is that its find values in each row that are common, hence the reason for the errors and circular dependencies. If you look at the example I created below. 12345 123456 occurs 3 times because it appears to be matching the 12345 part of the using the find text portion.
I'm don't know how big your table is, but you could apply a group by to it if that might fix what you need. Here's a recent solution I provided to another user. You would swap sum for count.
https://community.powerbi.com/t5/Power-Query/FORMULA-DIVIDE-PRODUT/m-p/2727198#M84253
Hope this helps. Love the username by the way... 🙌
Did I help you today? Please accept my solution and hit the Kudos button.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
100 | |
100 | |
80 | |
77 | |
66 |
User | Count |
---|---|
134 | |
106 | |
104 | |
85 | |
73 |