Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
MattReckoner
Frequent Visitor

Find a duplicate by multiple columns and group of rows

Hi everyone,

I have a problem finding a proper dax measure for a specific type of duplicates in a fact table

 

Uploading a snippet with few dummy data and the image with formatted background for duplicates

 

What is my goal:

I need to find duplicate Import batches for Semi Octa - but duplicate condition is met only when all [Lia] values for each [Freq] in a batch are the same with another batch

 

for instance - Serial number 190092PL has 2 duplicate batches :

ImportbatchID - 190092PL_1 and 190092PL_91 - same [lia] values for all frequencies  (50,100,200) in a batch

190092PL_13 and 190092PL_1 - same lia for freq 50,100, there is no 200 freq in that import batch

 

Serial_NumberImportbatchIDTestSelectedMeasurementfreq_BandFreqLia
190092PL190092PL_11wert1Semi Octa5044,9888
190092PL190092PL_11wert1Semi Octa10099,0903
190092PL190092PL_11wert1Semi Octa20034,9983
190092PL190092PL_81Qa89Semi Octa5043,90432
190092PL190092PL_92OP09eSemi Octa5021,434
190092PL190092PL_92OP09eSemi Octa10045,9034
190092PL190092PL_92OP09eSemi Octa2007483,8933
190092PL190092PL_132Xt56Semi Octa5021,73773
190092PL190092PL_132Xt56Semi Octa10056,98322
190092PL190092PL_912tr452wSemi Octa20034,9983
190092PL190092PL_171 Semi Octa10021,32
190092PL190092PL_171 Semi Octa20031,123
190092PL190092PL_171 Semi Octa5044,9888
190092PL190092PL_911tr452wSemi Octa10099,0903
190092PL190092PL_921wsx73Semi Octa20059,09815
190092PL190092PL_911tr452wSemi Octa5044,9888
190092PL190092PL_152cj9Semi Octa5021,73773
190092PL190092PL_152cj9Semi Octa10056,98322
9930PL329930PL32_341wrr23Semi Octa5044,93
9930PL329930PL32_341wrr23Semi Octa10045,932
9930PL329930PL32_341wrr23Semi Octa20071,093
9930PL329930PL32_441fd42Semi Octa10045,932
9930PL329930PL32_342 1 Octa10054,32
9930PL329930PL32_342 1 Octa5022,23
9930PL329930PL32_633 Semi Octa5045,65
9930PL329930PL32_633 Semi Octa10032,43
9930PL329930PL32_491 Semi Octa20059,09815
9930PL329930PL32_204 Semi Octa5044,9821103
9930PL329930PL32_204 Semi Octa10040,09921
9930PL329930PL32_204 Semi Octa20022,01101
9930PL329930PL32_195 Semi Octa5032,42332
9930PL329930PL32_195 Semi Octa1002,90019013
9930PL329930PL32_441fd42Semi Octa5044,93
9930PL329930PL32_441fd42Semi Octa20071,093

 

MattReckoner_1-1688572029573.png

 

i used a measure with distinctcount[lia]  - count[lia] to compare the values, but i dont know how to use it for group of frequencies

Does anyone have any idea?

 

2 REPLIES 2
Martin_D
Super User
Super User

Hi @MattReckoner ,

Is this correct understanding:
If Serial_Number and Frequency and Lia is the same and there is any second ImportbatchID for this combination, then this is a duplicate, right?
Any third, fourth, fifth, ... ImportbatchID is another duplicate?
Then this is what you need:
- Group by Serial_Number, Freq, Lia
- For each distinct combination of above, count occurences
- If there is more than one occurence, these are duplicates

 

Count of duplicate batches = 
VAR _DuplicatesPerSerialNumberFrequencyLia =
    ADDCOLUMNS (
        // Group by Serial_Number, Freq, Lia
        SUMMARIZECOLUMNS (
            'Table'[Serial_Number],
            'Table'[Freq],
            'Table'[Lia]
        ),
        "@CountOfDuplicates",
        // For each distinct combination of above, count occurences
        VAR _CountOfBatch = CALCULATE ( COUNTROWS ( 'Table' ) )
        RETURN
        // If there is more than one occurence, these are duplicates
        IF (
            _CountOfBatch > 1,
            _CountOfBatch - 1 // each first instance is not a duplicate
        )
    )
RETURN
// Return total number of duplicates in current context
SUMX ( _DuplicatesPerSerialNumberFrequencyLia, [@CountOfDuplicates] )

 

This measure produces this result from your sample data:

Duplicate batchesDuplicate batches

BR

Martin

github.pnglinkedin.png

Hi Martin_D

 

Thanks for reply! its not exactly what i wanted

As your dax finds duplicates for each frequency separately. 

I've created a measure with virtual table where i pivoted freq column. Made columns freq_50, freq_100, freq_200 and created uniqueKey as a combination of serial numer & freq_50 & freq_100 & freq_200 & lia rest is similar, countows and sumx. 

 

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors