Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
Hi,
I am trying to identify close duplicates in a list of payments: say 2 payments were made to the same vendor, for the same amount and at a similar date (within a 5 days range for example). The table looks like below, and I would like to be able to extract the first 2 rows only in this case.
Vendor A $1000 01/01/2021
Vendor A $1000 05/01/2021
Vendor A $1000 20/01/2021
Vendor A $500 01/01/2021
Vendor B $1000 01/01/2021
I was able to do this in M with PowerQuery with nested table, but it creates performance problem once you add many such queries (as it reload several times the original very large payment csv file). Would there be a way to implement such functionalities easily in DAX?
Thanks for your help!
Solved! Go to Solution.
Hi @MarcBlanc
Through your description , if the vendors are same and amounts are same, the date diff is less than or equal to 5 ,then display the first 2 rows only in this case . Is the screenshot below the result you want ?
I create a sample , maybe you can refer to it .
(1)Create a column to return the rank number .
Rank =RANKX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor])&&'Table'[Amount]=EARLIER('Table'[Amount])),'Table'[Date],,ASC)
(2)Create a column to judge whether the diff is in a 5 days range ,if yes ,return 1.
judge =
var _date=MAXX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor]) && 'Table'[Date]<EARLIER('Table'[Date]) ),'Table'[Date])
var _diff=DATEDIFF(_date,'Table'[Date],DAY)
return IF(_diff<=5 && _diff<>BLANK(),1,0)
(3)Put the column [judge] in card chart ,if it is greater than 0 , set [Rank] is less than or equal to 2 to return the first 2 rows . If the value for [judge] is 0, then there is no time interval within 5 days, and there is no need to filter .
I have attached my pbix file ,you can refer to it .
Best Regards
Community Support Team _ Ailsa Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @MarcBlanc
I am so glad to hear that the problem has been solved . Then please consider Accept it as the solution to help the other members find it more quickly.
Best Regards
Community Support Team _ Ailsa Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @MarcBlanc
Through your description , if the vendors are same and amounts are same, the date diff is less than or equal to 5 ,then display the first 2 rows only in this case . Is the screenshot below the result you want ?
I create a sample , maybe you can refer to it .
(1)Create a column to return the rank number .
Rank =RANKX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor])&&'Table'[Amount]=EARLIER('Table'[Amount])),'Table'[Date],,ASC)
(2)Create a column to judge whether the diff is in a 5 days range ,if yes ,return 1.
judge =
var _date=MAXX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor]) && 'Table'[Date]<EARLIER('Table'[Date]) ),'Table'[Date])
var _diff=DATEDIFF(_date,'Table'[Date],DAY)
return IF(_diff<=5 && _diff<>BLANK(),1,0)
(3)Put the column [judge] in card chart ,if it is greater than 0 , set [Rank] is less than or equal to 2 to return the first 2 rows . If the value for [judge] is 0, then there is no time interval within 5 days, and there is no need to filter .
I have attached my pbix file ,you can refer to it .
Best Regards
Community Support Team _ Ailsa Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Anonymous ,
Thanks for the solution. It works well. I only had to add the following condition in the judge so that it brings the correct result.
&& 'Table'[Amount]=EARLIER('Table'[Amount]) &&
Have a good day
@MarcBlanc , You can have column like this in DAX
New column =
var _max - maxx(filter(Table, [vendor] = earlier([vendor]) && [Date] <earlier([Date])),[Date])
return
if([Amount] =maxx(filter(Table, [vendor] = earlier([vendor]) && [Date] <_date),[Amount]) , 1, 0)
Thanks @amitchandak for the very quick answer.
I see this is the way to go, although it flags the 3rd line instead of the 1st/2nd ones (which are the close duplicate within 5 days range) as it is based on maax function and no thresholds. But that s a good start and I will work around with it.
Note: I add to slightly modify the code for the var (_max instead of _date) for it to work. For reference:
Check out the April 2025 Power BI update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
User | Count |
---|---|
103 | |
68 | |
47 | |
39 | |
37 |