Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Franny_Spanny
Frequent Visitor

Identify duplicates in new column as ERROR - don't want to remove any duplicate rows

Hello all!

 

I've found a few different solutions but none quite match what I'm looking for. 

 

I have a column of data, in which every row *should* be a Unique ID (thousands of rows) BUT there are lots of duplicates:

 

IDENTIFIER

12345

54321

23451

35421

12345

54321

54321

 

I don't want to remove the duplicates as I need to count them as I'll be creating visuals from this data later - I would like to create a Custom Column that states "ERROR" for every duplicate value, and "OKAY" for Unique IDs:

 

IDENTIFIER         Unique?

12345                 ERROR

54321                 ERROR

23451                 OKAY

35421                 OKAY

12345                 ERROR

54321                 ERROR

54321                 ERROR

 

I have tried the following from a Stack solution

 

if List.Count(List.Count(QUERY[IDENTIFIER],[IDENTIFIER])) > 1 then "ERROR" else "OKAY")

 

But I get varying errors, either cyclical references, something to do with type formatting (?) - but definitely not liking it either way!

 

I don't know if I'm missing something obvious (probably!) or I've misunderstood the Stack solution but just struggling to work it out.

 

Apologies once again for the lack of proper formatting. Hoping someone will be able to put me on the right track.

1 ACCEPTED SOLUTION
v-jayw-msft
Community Support
Community Support

Hi @Franny_Spanny ,

 

Try this method.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMk4xT/RLMzeMcLQsszSMMPAwL7VIM1KK1SFZKsUYr1QsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Column1"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"index", each _, type table [Column1=nullable text, Index=number]}}),
    #"Expanded index" = Table.ExpandTableColumn(#"Grouped Rows", "index", {"Column1", "Index"}, {"index.Column1", "index.Index"}),
    #"Added Custom" = Table.AddColumn(#"Expanded index", "Custom", each if [Count] = 2 then "error" else null),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Count", "index.Column1", "index.Index"})
in
    #"Removed Columns"

https://excelguru.ca/identify-duplicates-using-power-query/ .

vjaywmsft_0-1661937671485.png

 

Best Regards,

Jay

Community Support Team _ Jay
If this post helps, then please consider Accept it as the solution
to help the other members find it.

View solution in original post

5 REPLIES 5
v-jayw-msft
Community Support
Community Support

Hi @Franny_Spanny ,

 

Try this method.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMk4xT/RLMzeMcLQsszSMMPAwL7VIM1KK1SFZKsUYr1QsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Column1"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"index", each _, type table [Column1=nullable text, Index=number]}}),
    #"Expanded index" = Table.ExpandTableColumn(#"Grouped Rows", "index", {"Column1", "Index"}, {"index.Column1", "index.Index"}),
    #"Added Custom" = Table.AddColumn(#"Expanded index", "Custom", each if [Count] = 2 then "error" else null),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Count", "index.Column1", "index.Index"})
in
    #"Removed Columns"

https://excelguru.ca/identify-duplicates-using-power-query/ .

vjaywmsft_0-1661937671485.png

 

Best Regards,

Jay

Community Support Team _ Jay
If this post helps, then please consider Accept it as the solution
to help the other members find it.

I managed to follow the instructions on the page you linked and it worked with some trial and error on my part! Thank you!

davehus
Memorable Member
Memorable Member

Hi @Franny_Spanny ,

 

Try the attached solution. The List.Text element is looking for just that, a text value. The identifier could be numeric and it's throwing the error. 

 

Hope this helps.

 

Did I help you today? Please accept my solution and hit the Kudos button.

Hi @davehus - thanks for looking at this.

 

I tried changing the Date Type to text (which seems to be fine, no errors upon change), but I still get an error - "Expression.Error: A cyclic reference was encountered during evaluation."

 

Not sure if an actual data sample would help:

 

3d7aNf71XA9v91X0H7u8f2
3d7aNf71XA9v91X0H7u8f2
3d7aNf71XA9v91X0H7u8d3
3d7aNf71XA9v91X0H7u8d3

 

Not sure why mine doesn't want to work when your example clearly does!

Hi @Franny_Spanny ,

 

So what the calculation is doing is searching the text in the list against itself. What's happening is that its find values in each row that are common, hence the reason for the errors and circular dependencies. If you look at the example I created below. 12345 123456 occurs 3 times because it appears to be matching the 12345 part of the using the find text portion. 

davehus_0-1661519571166.png

 

I'm don't know how big your table is, but you could apply a group by to it if that might fix what you need. Here's a recent solution I provided to another user. You would swap sum for count.

 

https://community.powerbi.com/t5/Power-Query/FORMULA-DIVIDE-PRODUT/m-p/2727198#M84253

 

Hope this helps. Love the username by the way... 🙌

 

Did I help you today? Please accept my solution and hit the Kudos button.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.