Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
MihirK
New Member

Power Bi - Clean up misspelled city names in dataset

Hi all. I have a dataset in which the names of cities are misspelled.There are 1 lac rows in the datset. How can I clean the data in an efficient way?

MihirK_0-1714126811829.png

MihirK_1-1714126826667.png

 

1 ACCEPTED SOLUTION
v-heq-msft
Community Support
Community Support

Hi @MihirK,
Thanks for @lbendlin  reply. power bi really doesn't have an option in power bi that allows for corrective error reporting for spelling errors, but you can try the following.
Here some steps that I want to share, you can check them if they suitable for your requirement.
Here is my test data:

vheqmsft_0-1714371526237.png

In the pwoer query we need to use the Table.AddFuzzyClusterColumn function

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VY/BCsJADER/pfTsT4hFtIgIHqSUHmIbuqExKbtb9fNddrcFL2EmGYaXti2v+CkatVPZ7dryoq7Yy4iMLvqDoR5Gjfqki/MqUd+MotA3a2IYkGdDEBd3kFASoqSbrwhzTQXMkNpr6Cen8iZm3JK1umSOan3xCMMkFOXl9VxWLrCs3qfkWQYCgVmZ0jn/FPUfdSBdQbsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    #"City" = Table.AddFuzzyClusterColumn(#"Changed Type","Name","City")

in
    #"City"

The premise of using this function for fuzzy matching is that your correct name comes before the incorrect name, i.e. your correct name is retrieved in the very beginning
Final output

vheqmsft_1-1714371733889.png

Of course, you can also create new data with the correct name and then use the merge function in the power query to replace the incorrect data.

Best regards,

Albert He

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

 




View solution in original post

2 REPLIES 2
v-heq-msft
Community Support
Community Support

Hi @MihirK,
Thanks for @lbendlin  reply. power bi really doesn't have an option in power bi that allows for corrective error reporting for spelling errors, but you can try the following.
Here some steps that I want to share, you can check them if they suitable for your requirement.
Here is my test data:

vheqmsft_0-1714371526237.png

In the pwoer query we need to use the Table.AddFuzzyClusterColumn function

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VY/BCsJADER/pfTsT4hFtIgIHqSUHmIbuqExKbtb9fNddrcFL2EmGYaXti2v+CkatVPZ7dryoq7Yy4iMLvqDoR5Gjfqki/MqUd+MotA3a2IYkGdDEBd3kFASoqSbrwhzTQXMkNpr6Cen8iZm3JK1umSOan3xCMMkFOXl9VxWLrCs3qfkWQYCgVmZ0jn/FPUfdSBdQbsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    #"City" = Table.AddFuzzyClusterColumn(#"Changed Type","Name","City")

in
    #"City"

The premise of using this function for fuzzy matching is that your correct name comes before the incorrect name, i.e. your correct name is retrieved in the very beginning
Final output

vheqmsft_1-1714371733889.png

Of course, you can also create new data with the correct name and then use the merge function in the power query to replace the incorrect data.

Best regards,

Albert He

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

 




lbendlin
Super User
Super User

You must do that further upstream (for example by maintaing a manual reference table of misspellings).  

 

You can use Power BI to report on the gaps (likely misspellings) but you need to do the reference table maintenance outside of Power BI.  The Data Write Back features of Power BI are still pretty much non-existent.

 

If this is important to you please consider voting for an existing idea or raising a new one at https://ideas.fabric.microsoft.com/?forum=2d80fd4a-16cb-4189-896b-e0dac5e08b41

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.