Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hi,
Been searching the forum but haven’t really found a solution to my problem. Some threads are close, but maybe not all the way.
What I want to achieve:
Find and list duplicates based on 3 different columns
The columns that should be analyzed for duplicates:
If either 2 or 3 parameters are the same, they should be listed as duplicates.
I have created a unique ID per row in the Query.
As it is for a company with lot of different divisions, (lots) the Purchasing units can all buy from the same supplier. However, at times, each Division creates a supplier record for the same Supplier, hence creating a duplicate.
And/or – Division A categories the Supplier as “Phone retailer” and Division B categories the same supplier as “Computer manufacturer”, same thing there, two records, same Supplier.
Solved! Go to Solution.
Hi @tonijj ,
Measure =
VAR _countsupplier =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Supplier] = SELECTEDVALUE ( 'Table'[Supplier] )
)
)
VAR _countcategory =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Category] = SELECTEDVALUE ( 'Table'[Category] )
)
)
VAR _purchasingubit =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Purchasing Ubit] = SELECTEDVALUE ( 'Table'[Purchasing Ubit] )
)
)
RETURN
IF (
( _countsupplier >= 2
&& _countcategory >= 2 )
|| ( _countsupplier >= 2
&& _purchasingubit >= 2 )
|| ( _countcategory >= 2
&& _purchasingubit >= 2 ),
"Duplicate",
"No"
)
Because you are looking for 2 or 3 parameters are the same. We only need to consider the simplest two with duplicate values between them.
Best Regards
Community Support Team _ Polly
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @tonijj ,
Please have a try.
Create a measure.
Measure =
VAR _countsupplier =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Supplier] = SELECTEDVALUE ( 'Table'[Supplier] )
)
)
VAR _countcategory =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Category] = SELECTEDVALUE ( 'Table'[Category] )
)
)
VAR _purchasingubit =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Purchasing Ubit] = SELECTEDVALUE ( 'Table'[Purchasing Ubit] )
)
)
RETURN
IF (
( _countsupplier >= 2
&& _countcategory >= 2 )
|| ( _countsupplier >= 2
&& _purchasingubit >= 2 )
|| ( _countsupplier >= 2
&& _purchasingubit >= 2 ),
"Duplicate",
"No"
)
If I have misunderstood your meaning, please provide your desired output with more details and you sample pbix file without privacy information.
Best Regards
Community Support Team _ Polly
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hi @Anonymous
First of all, a big thanks for this!
I have just a few quick follow-up questions;
1, If we look at the bottom part of the formula, isnt the red highlighted part redundant?
( _countsupplier >= 2
&& _purchasingubit >= 2 )
|| ( _countsupplier >= 2
&& _purchasingubit >= 2 ),
2. Can I have more parameters to identify duplicates, basically, can I include more columns simply by following the logic in the code you provided?
Sincerely
Hi @tonijj ,
Measure =
VAR _countsupplier =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Supplier] = SELECTEDVALUE ( 'Table'[Supplier] )
)
)
VAR _countcategory =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Category] = SELECTEDVALUE ( 'Table'[Category] )
)
)
VAR _purchasingubit =
CALCULATE (
COUNTROWS ( 'Table' ),
FILTER (
ALL ( 'Table' ),
'Table'[supplier number] = SELECTEDVALUE ( 'Table'[supplier number] )
&& 'Table'[Purchasing Ubit] = SELECTEDVALUE ( 'Table'[Purchasing Ubit] )
)
)
RETURN
IF (
( _countsupplier >= 2
&& _countcategory >= 2 )
|| ( _countsupplier >= 2
&& _purchasingubit >= 2 )
|| ( _countcategory >= 2
&& _purchasingubit >= 2 ),
"Duplicate",
"No"
)
Because you are looking for 2 or 3 parameters are the same. We only need to consider the simplest two with duplicate values between them.
Best Regards
Community Support Team _ Polly
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 48 | |
| 45 | |
| 41 | |
| 19 | |
| 18 |
| User | Count |
|---|---|
| 68 | |
| 68 | |
| 33 | |
| 32 | |
| 31 |