Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi all. I have a dataset in which the names of cities are misspelled.There are 1 lac rows in the datset. How can I clean the data in an efficient way?
Solved! Go to Solution.
Hi @MihirK,
Thanks for @lbendlin reply. power bi really doesn't have an option in power bi that allows for corrective error reporting for spelling errors, but you can try the following.
Here some steps that I want to share, you can check them if they suitable for your requirement.
Here is my test data:
In the pwoer query we need to use the Table.AddFuzzyClusterColumn function
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VY/BCsJADER/pfTsT4hFtIgIHqSUHmIbuqExKbtb9fNddrcFL2EmGYaXti2v+CkatVPZ7dryoq7Yy4iMLvqDoR5Gjfqki/MqUd+MotA3a2IYkGdDEBd3kFASoqSbrwhzTQXMkNpr6Cen8iZm3JK1umSOan3xCMMkFOXl9VxWLrCs3qfkWQYCgVmZ0jn/FPUfdSBdQbsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
#"City" = Table.AddFuzzyClusterColumn(#"Changed Type","Name","City")
in
#"City"
The premise of using this function for fuzzy matching is that your correct name comes before the incorrect name, i.e. your correct name is retrieved in the very beginning
Final output
Of course, you can also create new data with the correct name and then use the merge function in the power query to replace the incorrect data.
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hi @MihirK,
Thanks for @lbendlin reply. power bi really doesn't have an option in power bi that allows for corrective error reporting for spelling errors, but you can try the following.
Here some steps that I want to share, you can check them if they suitable for your requirement.
Here is my test data:
In the pwoer query we need to use the Table.AddFuzzyClusterColumn function
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VY/BCsJADER/pfTsT4hFtIgIHqSUHmIbuqExKbtb9fNddrcFL2EmGYaXti2v+CkatVPZ7dryoq7Yy4iMLvqDoR5Gjfqki/MqUd+MotA3a2IYkGdDEBd3kFASoqSbrwhzTQXMkNpr6Cen8iZm3JK1umSOan3xCMMkFOXl9VxWLrCs3qfkWQYCgVmZ0jn/FPUfdSBdQbsf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
#"City" = Table.AddFuzzyClusterColumn(#"Changed Type","Name","City")
in
#"City"
The premise of using this function for fuzzy matching is that your correct name comes before the incorrect name, i.e. your correct name is retrieved in the very beginning
Final output
Of course, you can also create new data with the correct name and then use the merge function in the power query to replace the incorrect data.
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
You must do that further upstream (for example by maintaing a manual reference table of misspellings).
You can use Power BI to report on the gaps (likely misspellings) but you need to do the reference table maintenance outside of Power BI. The Data Write Back features of Power BI are still pretty much non-existent.
If this is important to you please consider voting for an existing idea or raising a new one at https://ideas.fabric.microsoft.com/?forum=2d80fd4a-16cb-4189-896b-e0dac5e08b41
User | Count |
---|---|
98 | |
90 | |
78 | |
71 | |
64 |
User | Count |
---|---|
112 | |
96 | |
95 | |
67 | |
65 |