Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more

Reply
MarcBlanc
Frequent Visitor

Find Close duplicate (same vendor, same amount, different date within a range)

Hi,

I am trying to identify close duplicates in a list of payments: say 2 payments were made to the same vendor, for the same amount and at a similar date (within a 5 days range for example). The table looks like below, and I would like to be able to extract the first 2 rows only in this case.

 

Vendor A     $1000    01/01/2021

Vendor A     $1000    05/01/2021

Vendor A     $1000    20/01/2021

Vendor A     $500      01/01/2021

Vendor B     $1000    01/01/2021

 

I was able to do this in M with PowerQuery with nested table, but it creates performance problem once you add many such queries (as it reload several times the original very large payment csv file). Would there be a way to implement such functionalities easily in DAX?


Thanks for your help!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @MarcBlanc 

Through your description , if the vendors are same and amounts are same, the date diff is less than or equal to 5 ,then display the first 2 rows only in this case . Is the screenshot below the result you want ?

Ailsamsft_0-1630308023570.png

I create a sample , maybe you can refer to it .

(1)Create a column to return the rank number .

Rank =RANKX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor])&&'Table'[Amount]=EARLIER('Table'[Amount])),'Table'[Date],,ASC)

(2)Create a column to judge whether the diff is in a 5 days range ,if yes ,return 1.

judge =
var _date=MAXX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor]) && 'Table'[Date]<EARLIER('Table'[Date]) ),'Table'[Date])
var _diff=DATEDIFF(_date,'Table'[Date],DAY)
return IF(_diff<=5 && _diff<>BLANK(),1,0)

(3)Put the column [judge] in card chart ,if it is greater than 0 , set [Rank] is less than or equal to 2 to return the first 2 rows . If the value for [judge] is 0, then there is no time interval within 5 days, and there is no need to filter .

Ailsamsft_1-1630308023572.png

Ailsamsft_2-1630308023574.png

I have attached my pbix file ,you can refer to it .

 

Best Regards

Community Support Team _ Ailsa Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Hi @MarcBlanc 

I am so glad to hear that the problem has been solved . Then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards

Community Support Team _ Ailsa Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi @MarcBlanc 

Through your description , if the vendors are same and amounts are same, the date diff is less than or equal to 5 ,then display the first 2 rows only in this case . Is the screenshot below the result you want ?

Ailsamsft_0-1630308023570.png

I create a sample , maybe you can refer to it .

(1)Create a column to return the rank number .

Rank =RANKX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor])&&'Table'[Amount]=EARLIER('Table'[Amount])),'Table'[Date],,ASC)

(2)Create a column to judge whether the diff is in a 5 days range ,if yes ,return 1.

judge =
var _date=MAXX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor]) && 'Table'[Date]<EARLIER('Table'[Date]) ),'Table'[Date])
var _diff=DATEDIFF(_date,'Table'[Date],DAY)
return IF(_diff<=5 && _diff<>BLANK(),1,0)

(3)Put the column [judge] in card chart ,if it is greater than 0 , set [Rank] is less than or equal to 2 to return the first 2 rows . If the value for [judge] is 0, then there is no time interval within 5 days, and there is no need to filter .

Ailsamsft_1-1630308023572.png

Ailsamsft_2-1630308023574.png

I have attached my pbix file ,you can refer to it .

 

Best Regards

Community Support Team _ Ailsa Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @Anonymous ,

 

Thanks for the solution. It works well. I only had to add the following condition in the judge so that it brings the correct result. 

&& 'Table'[Amount]=EARLIER('Table'[Amount]) && 

 

Have a good day

amitchandak
Super User
Super User

@MarcBlanc , You can have column like this in DAX

 


New column =
var _max - maxx(filter(Table, [vendor] = earlier([vendor]) && [Date] <earlier([Date])),[Date])
return
if([Amount] =maxx(filter(Table, [vendor] = earlier([vendor]) && [Date] <_date),[Amount]) , 1, 0)

Full Power BI Video 20 Hours YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube

Thanks @amitchandak for the very quick answer.

 

I see this is the way to go, although it flags the 3rd line instead of the 1st/2nd ones (which are the close duplicate within 5 days range) as it is based on maax function and no thresholds. But that s a good start and I will work around with it.

 

Note: I add to slightly modify the code for the var (_max instead of _date) for it to work. For reference:

 

New column =
var _max = maxx(filter(Payments, [Vendor] = earlier([Vendor]) && [Date] <earlier([Date])),[Date])
return
if([Amount] = maxx(filter(Payments, [Vendor] = earlier([Vendor]) && [Date] <_max),[Amount]) , 1, 0)

Helpful resources

Announcements
PBIApril_Carousel

Power BI Monthly Update - April 2025

Check out the April 2025 Power BI update to learn about new features.

Notebook Gallery Carousel1

NEW! Community Notebooks Gallery

Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.

April2025 Carousel

Fabric Community Update - April 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors