Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
theo
Helper III
Helper III

Count duplicate values using switch or if statement

Hi.  I have been bugging witht this problem to count duplicates in multiple columns involving 12millions rows and counting.  I am trying now with the approach which does not provide me the correct result.

Basically what I need is to check how maany from in the table have 5 similar entries with the current table
Table has 6 columns, and duplicate number does not need to be in the same column.  As you can see my attemp below is to check only 1 duplicate (I checked only the first column agains 2 columns just to see if it works)

Any different approach involving measure is appreciated (calculated columns may either be slow or will result to memory issue).

Thanks.

=sumx(
'3_13May',
 if(sumx('3_13May',SWITCH (
        TRUE (),
        '3_13May'[Column1]=EARLIER('3_13May'[Column1]),1,
        '3_13May'[Column2]=EARLIER('3_13May'[Column1]),1,0))=1,1,0
))
6 REPLIES 6
Mariusz
Community Champion
Community Champion

Hi @theo 

Can you send some data sample?

 

Best Regards,
Mariusz

Please feel free to connect with me.
Mariusz Repczynski

 

Hi @Mariusz ;

Here's a representative sample  and some manual calculations I did.

Capture.JPG

Mariusz
Community Champion
Community Champion

Hi @theo 

 

You can achieve this by applying three steps in Query editor.

Please see the M code below based on the example that you provided.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dY+xDcQwDAN3cZ0iEpnEmSXw/mu8zm4ej3xBwQB5lPw8LdrWsqSSS0fpbGN7d67p9HrdpdgZ5IJg6J8LHsdqpSZOxsUgHcRzf/NzXkF7rvbknKQwSSbJnMn+69MqjhAlyukLX/iCF7zg1b98UIMa1Oz32m9QgxrUoOYTvtsYHw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t, Column6 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", Int64.Type}, {"Column4", Int64.Type}, {"Column5", Int64.Type}, {"Column6", Int64.Type}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {}, "Attribute", "Value"),
    #"Grouped Rows" = Table.Group(#"Unpivoted Columns", {"Value"}, {{"Count", each Table.RowCount(_), type number}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([Count] <> 1))
in
    #"Filtered Rows"
Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

Please feel free to connect with me.
Mariusz Repczynski

 

hi @Mariusz Thanks but it doesnt provide the result that I need.

The code you provided just count the duplicate of all entry .

What I need is to count the rows that have duplicate (eg. 5 duplicates) based on the current row.

in my example under dup 5 count, line 1 has a count of 1 since there is one row that has 5 similar entry with line 1 by virtue of 1,2,3,4, and 5.  same goes with line 2 while the rest has 0 meaning no row in the table that has 5 similar entry.

Mariusz
Community Champion
Community Champion

Hi @theo 

Please see the below three tables, I believe that this will do whats required however I do not think it will perform very well with 12 million rows.

// yourSampleTable
let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dY+xDcQwDAN3cZ0iEpnEmSXw/mu8zm4ej3xBwQB5lPw8LdrWsqSSS0fpbGN7d67p9HrdpdgZ5IJg6J8LHsdqpSZOxsUgHcRzf/NzXkF7rvbknKQwSSbJnMn+69MqjhAlyukLX/iCF7zg1b98UIMa1Oz32m9QgxrUoOYTvtsYHw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t, Column6 = _t]),
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Column1", "Column2", "Column3", "Column4", "Column5", "Column6"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns",{{"Index", Int64.Type}})
in
    #"Changed Type"
// duplicateCountTable
let
    Source = yourSampleTable,
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {"Index"}, "Column", "Value"),
    #"Merged Queries" = Table.NestedJoin(#"Unpivoted Columns", {"Value"}, #"Unpivoted Columns", {"Value"}, "Unpivoted Columns", JoinKind.Inner),
    #"Expanded Unpivoted Columns" = Table.ExpandTableColumn(#"Merged Queries", "Unpivoted Columns", {"Index"}, {"Index.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Unpivoted Columns", each ([Index.1] <> [Index])),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Index", "Index.1"}, {{"Count", each "Dup " & Number.ToText( List.Count(_) ) & " Count", type text }}),
    #"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Count", Order.Descending}}),
    #"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Sorted Rows", {{"Count", type text}}, "en-GB"), List.Distinct(Table.TransformColumnTypes(#"Sorted Rows", {{"Count", type text}}, "en-GB")[Count]), "Count", "Index.1", List.Count)
in
    #"Pivoted Column"

 

// mergeOutput
let
    Source = Table.NestedJoin(yourSampleTable, {"Index"}, duplicateCountTable, {"Index"}, "duplicateCountTable", JoinKind.LeftOuter),
    #"Expanded duplicateCountTable" = Table.ExpandTableColumn(Source, "duplicateCountTable", {"Dup 5 Count", "Dup 4 Count", "Dup 3 Count", "Dup 2 Count", "Dup 1 Count"}, {"Dup 5 Count", "Dup 4 Count", "Dup 3 Count", "Dup 2 Count", "Dup 1 Count"})
in
    #"Expanded duplicateCountTable"

 

 

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

Please feel free to connect with me.
Mariusz Repczynski

 

Hi @Mariusz Thanks.  Will try this one.   That is also my concern for using power query, it may not be able to work on the 12m rows.  So Im thibking if measure maybe a better alternative.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.