The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi everyone,
I have a problem finding a proper dax measure for a specific type of duplicates in a fact table
Uploading a snippet with few dummy data and the image with formatted background for duplicates
What is my goal:
I need to find duplicate Import batches for Semi Octa - but duplicate condition is met only when all [Lia] values for each [Freq] in a batch are the same with another batch
for instance - Serial number 190092PL has 2 duplicate batches :
ImportbatchID - 190092PL_1 and 190092PL_91 - same [lia] values for all frequencies (50,100,200) in a batch
190092PL_13 and 190092PL_1 - same lia for freq 50,100, there is no 200 freq in that import batch
Serial_Number | ImportbatchID | Test | SelectedMeasurement | freq_Band | Freq | Lia |
190092PL | 190092PL_1 | 1 | wert1 | Semi Octa | 50 | 44,9888 |
190092PL | 190092PL_1 | 1 | wert1 | Semi Octa | 100 | 99,0903 |
190092PL | 190092PL_1 | 1 | wert1 | Semi Octa | 200 | 34,9983 |
190092PL | 190092PL_8 | 1 | Qa89 | Semi Octa | 50 | 43,90432 |
190092PL | 190092PL_9 | 2 | OP09e | Semi Octa | 50 | 21,434 |
190092PL | 190092PL_9 | 2 | OP09e | Semi Octa | 100 | 45,9034 |
190092PL | 190092PL_9 | 2 | OP09e | Semi Octa | 200 | 7483,8933 |
190092PL | 190092PL_13 | 2 | Xt56 | Semi Octa | 50 | 21,73773 |
190092PL | 190092PL_13 | 2 | Xt56 | Semi Octa | 100 | 56,98322 |
190092PL | 190092PL_91 | 2 | tr452w | Semi Octa | 200 | 34,9983 |
190092PL | 190092PL_17 | 1 | Semi Octa | 100 | 21,32 | |
190092PL | 190092PL_17 | 1 | Semi Octa | 200 | 31,123 | |
190092PL | 190092PL_17 | 1 | Semi Octa | 50 | 44,9888 | |
190092PL | 190092PL_91 | 1 | tr452w | Semi Octa | 100 | 99,0903 |
190092PL | 190092PL_92 | 1 | wsx73 | Semi Octa | 200 | 59,09815 |
190092PL | 190092PL_91 | 1 | tr452w | Semi Octa | 50 | 44,9888 |
190092PL | 190092PL_15 | 2 | cj9 | Semi Octa | 50 | 21,73773 |
190092PL | 190092PL_15 | 2 | cj9 | Semi Octa | 100 | 56,98322 |
9930PL32 | 9930PL32_34 | 1 | wrr23 | Semi Octa | 50 | 44,93 |
9930PL32 | 9930PL32_34 | 1 | wrr23 | Semi Octa | 100 | 45,932 |
9930PL32 | 9930PL32_34 | 1 | wrr23 | Semi Octa | 200 | 71,093 |
9930PL32 | 9930PL32_44 | 1 | fd42 | Semi Octa | 100 | 45,932 |
9930PL32 | 9930PL32_34 | 2 | 1 Octa | 100 | 54,32 | |
9930PL32 | 9930PL32_34 | 2 | 1 Octa | 50 | 22,23 | |
9930PL32 | 9930PL32_63 | 3 | Semi Octa | 50 | 45,65 | |
9930PL32 | 9930PL32_63 | 3 | Semi Octa | 100 | 32,43 | |
9930PL32 | 9930PL32_49 | 1 | Semi Octa | 200 | 59,09815 | |
9930PL32 | 9930PL32_20 | 4 | Semi Octa | 50 | 44,9821103 | |
9930PL32 | 9930PL32_20 | 4 | Semi Octa | 100 | 40,09921 | |
9930PL32 | 9930PL32_20 | 4 | Semi Octa | 200 | 22,01101 | |
9930PL32 | 9930PL32_19 | 5 | Semi Octa | 50 | 32,42332 | |
9930PL32 | 9930PL32_19 | 5 | Semi Octa | 100 | 2,90019013 | |
9930PL32 | 9930PL32_44 | 1 | fd42 | Semi Octa | 50 | 44,93 |
9930PL32 | 9930PL32_44 | 1 | fd42 | Semi Octa | 200 | 71,093 |
i used a measure with distinctcount[lia] - count[lia] to compare the values, but i dont know how to use it for group of frequencies
Does anyone have any idea?
Hi @MattReckoner ,
Is this correct understanding:
If Serial_Number and Frequency and Lia is the same and there is any second ImportbatchID for this combination, then this is a duplicate, right?
Any third, fourth, fifth, ... ImportbatchID is another duplicate?
Then this is what you need:
- Group by Serial_Number, Freq, Lia
- For each distinct combination of above, count occurences
- If there is more than one occurence, these are duplicates
Count of duplicate batches =
VAR _DuplicatesPerSerialNumberFrequencyLia =
ADDCOLUMNS (
// Group by Serial_Number, Freq, Lia
SUMMARIZECOLUMNS (
'Table'[Serial_Number],
'Table'[Freq],
'Table'[Lia]
),
"@CountOfDuplicates",
// For each distinct combination of above, count occurences
VAR _CountOfBatch = CALCULATE ( COUNTROWS ( 'Table' ) )
RETURN
// If there is more than one occurence, these are duplicates
IF (
_CountOfBatch > 1,
_CountOfBatch - 1 // each first instance is not a duplicate
)
)
RETURN
// Return total number of duplicates in current context
SUMX ( _DuplicatesPerSerialNumberFrequencyLia, [@CountOfDuplicates] )
This measure produces this result from your sample data:
BR
Martin
Hi Martin_D
Thanks for reply! its not exactly what i wanted
As your dax finds duplicates for each frequency separately.
I've created a measure with virtual table where i pivoted freq column. Made columns freq_50, freq_100, freq_200 and created uniqueKey as a combination of serial numer & freq_50 & freq_100 & freq_200 & lia rest is similar, countows and sumx.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
20 | |
18 | |
17 | |
14 | |
13 |
User | Count |
---|---|
36 | |
35 | |
21 | |
19 | |
18 |