Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
MarcBlanc
Frequent Visitor

Find Close duplicate (same vendor, same amount, different date within a range)

Hi,

I am trying to identify close duplicates in a list of payments: say 2 payments were made to the same vendor, for the same amount and at a similar date (within a 5 days range for example). The table looks like below, and I would like to be able to extract the first 2 rows only in this case.

 

Vendor A     $1000    01/01/2021

Vendor A     $1000    05/01/2021

Vendor A     $1000    20/01/2021

Vendor A     $500      01/01/2021

Vendor B     $1000    01/01/2021

 

I was able to do this in M with PowerQuery with nested table, but it creates performance problem once you add many such queries (as it reload several times the original very large payment csv file). Would there be a way to implement such functionalities easily in DAX?


Thanks for your help!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @MarcBlanc 

Through your description , if the vendors are same and amounts are same, the date diff is less than or equal to 5 ,then display the first 2 rows only in this case . Is the screenshot below the result you want ?

Ailsamsft_0-1630308023570.png

I create a sample , maybe you can refer to it .

(1)Create a column to return the rank number .

Rank =RANKX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor])&&'Table'[Amount]=EARLIER('Table'[Amount])),'Table'[Date],,ASC)

(2)Create a column to judge whether the diff is in a 5 days range ,if yes ,return 1.

judge =
var _date=MAXX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor]) && 'Table'[Date]<EARLIER('Table'[Date]) ),'Table'[Date])
var _diff=DATEDIFF(_date,'Table'[Date],DAY)
return IF(_diff<=5 && _diff<>BLANK(),1,0)

(3)Put the column [judge] in card chart ,if it is greater than 0 , set [Rank] is less than or equal to 2 to return the first 2 rows . If the value for [judge] is 0, then there is no time interval within 5 days, and there is no need to filter .

Ailsamsft_1-1630308023572.png

Ailsamsft_2-1630308023574.png

I have attached my pbix file ,you can refer to it .

 

Best Regards

Community Support Team _ Ailsa Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

5 REPLIES 5
Anonymous
Not applicable

Hi @MarcBlanc 

I am so glad to hear that the problem has been solved . Then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards

Community Support Team _ Ailsa Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi @MarcBlanc 

Through your description , if the vendors are same and amounts are same, the date diff is less than or equal to 5 ,then display the first 2 rows only in this case . Is the screenshot below the result you want ?

Ailsamsft_0-1630308023570.png

I create a sample , maybe you can refer to it .

(1)Create a column to return the rank number .

Rank =RANKX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor])&&'Table'[Amount]=EARLIER('Table'[Amount])),'Table'[Date],,ASC)

(2)Create a column to judge whether the diff is in a 5 days range ,if yes ,return 1.

judge =
var _date=MAXX(FILTER('Table','Table'[Vendor]=EARLIER('Table'[Vendor]) && 'Table'[Date]<EARLIER('Table'[Date]) ),'Table'[Date])
var _diff=DATEDIFF(_date,'Table'[Date],DAY)
return IF(_diff<=5 && _diff<>BLANK(),1,0)

(3)Put the column [judge] in card chart ,if it is greater than 0 , set [Rank] is less than or equal to 2 to return the first 2 rows . If the value for [judge] is 0, then there is no time interval within 5 days, and there is no need to filter .

Ailsamsft_1-1630308023572.png

Ailsamsft_2-1630308023574.png

I have attached my pbix file ,you can refer to it .

 

Best Regards

Community Support Team _ Ailsa Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @Anonymous ,

 

Thanks for the solution. It works well. I only had to add the following condition in the judge so that it brings the correct result. 

&& 'Table'[Amount]=EARLIER('Table'[Amount]) && 

 

Have a good day

amitchandak
Super User
Super User

@MarcBlanc , You can have column like this in DAX

 


New column =
var _max - maxx(filter(Table, [vendor] = earlier([vendor]) && [Date] <earlier([Date])),[Date])
return
if([Amount] =maxx(filter(Table, [vendor] = earlier([vendor]) && [Date] <_date),[Amount]) , 1, 0)

Share with Power BI Enthusiasts: Full Power BI Video (20 Hours) YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube

Thanks @amitchandak for the very quick answer.

 

I see this is the way to go, although it flags the 3rd line instead of the 1st/2nd ones (which are the close duplicate within 5 days range) as it is based on maax function and no thresholds. But that s a good start and I will work around with it.

 

Note: I add to slightly modify the code for the var (_max instead of _date) for it to work. For reference:

 

New column =
var _max = maxx(filter(Payments, [Vendor] = earlier([Vendor]) && [Date] <earlier([Date])),[Date])
return
if([Amount] = maxx(filter(Payments, [Vendor] = earlier([Vendor]) && [Date] <_max),[Amount]) , 1, 0)

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.