Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Matthieu_R
Helper I
Helper I

Tips for a very inefficient fuzzy join

Hello guys,

 

I need to do a fuzzy join between two tables and it results in very bad performance when refreshing my dataset. Let me explain the context.

 

I have my left table that contains today around 1.5 million of lines and it will keep growing. In this table there is around 35 columns.

And there is one column which contain a city name (text type). The content of this column is provided by the user of the app so for the same city, i can have multiple variations with uppercase / lowercase / mistakes and so on.

 

On the right side, i have a table (called cities) that contains around 40K lines. There is 7 column in this table. And there is one column that contains all unique city names available in my country. Column is in capital, no mistake, clean etc...

 

So i started to do a fuzzy join between both table to try to match the city name from left table with the clean city name of the right table. Settings of the fuzzy join are :

 

- Similarity level : 0.5

- Ignore Case sensitivity

- Only 1 match

 

When the match is done, in the left table, i only keep the city name column from the right table, not the entire table.

 

This result in around 99.8% of match, which is ok for me.

 

But then, when i refresh my dataset, all tables refresh pretty quickly, around 1 to 2 minutes (i have around 20 tables), except the left table that contains the new column from the fuzzy join (the one with 1.5 millions of lines). This table takes so long to refresh (around 30 to 45 min). Which is really not acceptable in everyday work.

 

I try to removed that fuzzy join to make sure that it was the cause of the problem. And then the refresh time went back to normal so i'm pretty sure that very long refresh time is in relation with the fuzzy join.

 

So dou you have any tips to make this operation faster ?

 

Thanks in advance for your help,

 

Regards

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi, @Matthieu_R 

Before performing a fuzzy join, consider preprocessing the data to reduce the complexity of the matching process. For city names in the left table, you can apply transformations to standardize the text formatting. This helps reduce data variability and may improve the efficiency of fuzzy joins.

If your dataset in Power BI allows incremental refresh, consider implementing an incremental refresh strategy. This allows you to limit the amount of data processed during each refresh to only the amount of data that is added or changed, rather than reprocessing the entire dataset. This can significantly reduce refresh times. You can see more information about incremental refresh:

Incremental refresh for semantic models and real-time data in Power BI - Power BI | Microsoft Learn

You can also take a look at the documentation on performance optimization:

Optimization guide for Power BI - Power BI | Microsoft Learn

 

How to Get Your Question Answered Quickly 

Best Regards

Yongkang Hua

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi, @Matthieu_R 

Before performing a fuzzy join, consider preprocessing the data to reduce the complexity of the matching process. For city names in the left table, you can apply transformations to standardize the text formatting. This helps reduce data variability and may improve the efficiency of fuzzy joins.

If your dataset in Power BI allows incremental refresh, consider implementing an incremental refresh strategy. This allows you to limit the amount of data processed during each refresh to only the amount of data that is added or changed, rather than reprocessing the entire dataset. This can significantly reduce refresh times. You can see more information about incremental refresh:

Incremental refresh for semantic models and real-time data in Power BI - Power BI | Microsoft Learn

You can also take a look at the documentation on performance optimization:

Optimization guide for Power BI - Power BI | Microsoft Learn

 

How to Get Your Question Answered Quickly 

Best Regards

Yongkang Hua

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks i will apply your advices.

 

I didn't know about incremental refresh in Power Bi.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.