Compare and Bulk Replace values based on two diffe...

Anonymous · ‎09-09-2021

Hi,

I have a table with multiple columns and in one column (it is not formatted as table) I have more than 62000 rows. this column called Resource_column is full of errors, typos, nomenclatures, etc. I also another column called Reference_Column which is formated as table. I have to compare my Resource_column with Reference_column and replace all the errors with good ones based on the Resource_column. I have put the link to the file here:

https://docs.google.com/spreadsheets/d/1RT9d8tJjvM_c23y54Yg3ASj9svxOut5Z/edit?usp=sharing&ouid=10542...

and added an screenshots to show the difference:

Notice: the blank rows in the Resource_Column should not be filled at the end of replacing.

Thanks in advance!

Edit: I really appreciate your support as it looks pretty complicated issue. The main problem is that I am looking for a way to not go into Bulk Find and Replace alley as I have a lot of values to be fixed (it is not possible to make a complete find column and use it with replace)

@Greg_Deckler , @amitchandak , @parry2k , @jdbuchanan71 , @ryan_mayu , @MFelix , @selimovd

v-shex-msft · ‎09-14-2021

HI @Anonymous,

According to your description, I think your requirement should more suitable to use AI training of text analysis instead of power query/Dax.

For this scenario, I'd like to suggest your or do data processing on the source data out of power bi side or try to use a web connector to invoke the external API of AI analysis.

Regards,
Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Anonymous · ‎09-15-2021

Thank you for the reply!

Would you suggest the best API or AI app/software to take advantage as this is not a recurring job but an ad hoc report!

bestmbaman

v-shex-msft · ‎09-17-2021

Hi @Anonymous,

If you are not familiar with these parts, I'd like to suggest you use the third-party API(e.g MS partner that provides the text analysis service) for these types of AI text analysis services.
Then you only need to use raw records to invoke the API to get the fixed results without complex training and learning steps.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Greg_Deckler · ‎09-10-2021

@Anonymous I would try the fuzzy matching in Merge queries first. If that fails then you can look at my very flexible and tunable DAX Fuzzy matching algorithm here: Fuzzy - Microsoft Power BI Community.

Either way though, I agree with @lbendlin that you won't be able to identify true errors unless there is some other data you have that can identify that.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎09-11-2021

@Greg_Deckler Thank you Greg!

by other Data, what exactly you mean?! I already shared a reference table (with only one column) that contains all the correct names. The challange is how I can figure multiple type of errors and at the same time how to match two different columns with different number of rows?

Assuming that I figured the errors and fixed them , how can I replace the good values in a column with 406 rows to a column with +62000 rows?!

After all, what programming language other than DAX can be used here if DAX falls short here?

Please don't Archive this question and let other ones to try their opinion, I believe this challange comming from the real world business is not just mine.

Dear:

@amitchandak , @parry2k , @jdbuchanan71 , @ryan_mayu , @MFelix , @selimovd

The real challange is here, I would kindly appreciate your opinion

Thank you @Greg_Deckler , @lbendlin very much

Greg_Deckler · ‎09-12-2021

@Anonymous By the data I mean I don't want to have to type all of your data from a screen shot but rather just copy and paste it. In theory you have access to Power Query and could use that to solve this problem, particularly the fuzzy matching options when performing a Merge query. Outside of that you could also possibly do this in R or Python within Power Query as well. But, once you get to the Desktop itselft then yes it is DAX for the most part. And thus you could potentially use my very flexible and tunable DAX Fuzzy matching algorithm here: Fuzzy - Microsoft Power BI Community.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

lbendlin · ‎09-10-2021

I think you have bigger problems than that. How is the script supposed to be able to tell if it is a typo or an error? Which similarity algorithm are you planning to employ? Soundex? Levenshtein ? some other one?

String similarity — the basic know your algorithms guide! | by Mohit Mayank | ITNEXT

Anonymous · ‎09-10-2021

Thanks you @lbendlin ,

I am totally confused with this challange!

In this case what is the best solution you can think of?

Also please note that this is just a sample of the real data set manipulated and transformed only for this community to show the real problem and seek solution!

Looking forward to hearing from you

Thanks,

Also calling on all great heros:

@Greg_Deckler , @amitchandak , @parry2k , @jdbuchanan71 , @ryan_mayu , @MFelix , @selimovd

lbendlin · ‎09-10-2021

There is no magic to this - you and your requestor need to agree on which algorithm to use, and what the cutoffs should be. The shorter the words the more risky/useless the algorithms will become.

Compare and Bulk Replace values based on two different tables (text errors, typos, nomenclatures)

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - January 2025

Fabric Community Update - January 2025

How to Get Your Question Answered Quickly

New Offer! Become a Certified Fabric Data Engineer