Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
MattReckoner
Frequent Visitor

Find a duplicate by multiple columns and group of rows

Hi everyone,

I have a problem finding a proper dax measure for a specific type of duplicates in a fact table

 

Uploading a snippet with few dummy data and the image with formatted background for duplicates

 

What is my goal:

I need to find duplicate Import batches for Semi Octa - but duplicate condition is met only when all [Lia] values for each [Freq] in a batch are the same with another batch

 

for instance - Serial number 190092PL has 2 duplicate batches :

ImportbatchID - 190092PL_1 and 190092PL_91 - same [lia] values for all frequencies  (50,100,200) in a batch

190092PL_13 and 190092PL_1 - same lia for freq 50,100, there is no 200 freq in that import batch

 

Serial_NumberImportbatchIDTestSelectedMeasurementfreq_BandFreqLia
190092PL190092PL_11wert1Semi Octa5044,9888
190092PL190092PL_11wert1Semi Octa10099,0903
190092PL190092PL_11wert1Semi Octa20034,9983
190092PL190092PL_81Qa89Semi Octa5043,90432
190092PL190092PL_92OP09eSemi Octa5021,434
190092PL190092PL_92OP09eSemi Octa10045,9034
190092PL190092PL_92OP09eSemi Octa2007483,8933
190092PL190092PL_132Xt56Semi Octa5021,73773
190092PL190092PL_132Xt56Semi Octa10056,98322
190092PL190092PL_912tr452wSemi Octa20034,9983
190092PL190092PL_171 Semi Octa10021,32
190092PL190092PL_171 Semi Octa20031,123
190092PL190092PL_171 Semi Octa5044,9888
190092PL190092PL_911tr452wSemi Octa10099,0903
190092PL190092PL_921wsx73Semi Octa20059,09815
190092PL190092PL_911tr452wSemi Octa5044,9888
190092PL190092PL_152cj9Semi Octa5021,73773
190092PL190092PL_152cj9Semi Octa10056,98322
9930PL329930PL32_341wrr23Semi Octa5044,93
9930PL329930PL32_341wrr23Semi Octa10045,932
9930PL329930PL32_341wrr23Semi Octa20071,093
9930PL329930PL32_441fd42Semi Octa10045,932
9930PL329930PL32_342 1 Octa10054,32
9930PL329930PL32_342 1 Octa5022,23
9930PL329930PL32_633 Semi Octa5045,65
9930PL329930PL32_633 Semi Octa10032,43
9930PL329930PL32_491 Semi Octa20059,09815
9930PL329930PL32_204 Semi Octa5044,9821103
9930PL329930PL32_204 Semi Octa10040,09921
9930PL329930PL32_204 Semi Octa20022,01101
9930PL329930PL32_195 Semi Octa5032,42332
9930PL329930PL32_195 Semi Octa1002,90019013
9930PL329930PL32_441fd42Semi Octa5044,93
9930PL329930PL32_441fd42Semi Octa20071,093

 

MattReckoner_1-1688572029573.png

 

i used a measure with distinctcount[lia]  - count[lia] to compare the values, but i dont know how to use it for group of frequencies

Does anyone have any idea?

 

2 REPLIES 2
Martin_D
Super User
Super User

Hi @MattReckoner ,

Is this correct understanding:
If Serial_Number and Frequency and Lia is the same and there is any second ImportbatchID for this combination, then this is a duplicate, right?
Any third, fourth, fifth, ... ImportbatchID is another duplicate?
Then this is what you need:
- Group by Serial_Number, Freq, Lia
- For each distinct combination of above, count occurences
- If there is more than one occurence, these are duplicates

 

Count of duplicate batches = 
VAR _DuplicatesPerSerialNumberFrequencyLia =
    ADDCOLUMNS (
        // Group by Serial_Number, Freq, Lia
        SUMMARIZECOLUMNS (
            'Table'[Serial_Number],
            'Table'[Freq],
            'Table'[Lia]
        ),
        "@CountOfDuplicates",
        // For each distinct combination of above, count occurences
        VAR _CountOfBatch = CALCULATE ( COUNTROWS ( 'Table' ) )
        RETURN
        // If there is more than one occurence, these are duplicates
        IF (
            _CountOfBatch > 1,
            _CountOfBatch - 1 // each first instance is not a duplicate
        )
    )
RETURN
// Return total number of duplicates in current context
SUMX ( _DuplicatesPerSerialNumberFrequencyLia, [@CountOfDuplicates] )

 

This measure produces this result from your sample data:

Duplicate batchesDuplicate batches

BR

Martin

github.pnglinkedin.png

Hi Martin_D

 

Thanks for reply! its not exactly what i wanted

As your dax finds duplicates for each frequency separately. 

I've created a measure with virtual table where i pivoted freq column. Made columns freq_50, freq_100, freq_200 and created uniqueKey as a combination of serial numer & freq_50 & freq_100 & freq_200 & lia rest is similar, countows and sumx. 

 

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugPowerBI_Carousel

Power BI Monthly Update - August 2024

Check out the August 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.