Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
jsondervorst1
New Member

Incorrect similarity score calculation for text string comparison

I'm comparing two text strings via a fuzzy join function to get the similarity score.

However, I'm getting incorrect results for the similarity score.

If I'm not mistaken in my search for solution Fuzzy Match calculation explained , it should be based on the jaccard similarity score. However the results are not corresponding:

Example 'INDEMANS CHR SRL' vs 'INDEMANS CHRISTIAN'

Power Query: result 0,37

jsondervorst1_0-1677158031166.png

jsondervorst1_1-1677158073142.png

Online tool: result 0,55

Link 

jsondervorst1_2-1677158328867.png

 

How is this possible?

 

Thank you in advance.

 

Update: I've added a step removing spaces in both columns, making the result for these specific strings better (0,89). I'm then returning the maximum score of the two comparisons. However, I suppose normally the 'IgnoreSpace' should have covered this, which clearly is not the case)

0 REPLIES 0

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors