We've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
Hi there everyone,
I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:
1,234.50 kg
1.234,5Kg
12.o3 Kg
approx. 1,5 kg
0, 12 kg
1.234
1.23
1.234 + 4,56
I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.
I'd appreciate any ideas on how to deal with this.
Thanks!
Solved! Go to Solution.
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...
Result
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each
[ chars = Text.ToList("+-,.oO"),
a = Text.Select([Column1], {"0".."9"} & chars),
b = Text.ToList(Text.Trim(a, chars)),
c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else
[ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
][d2],
e = Text.Combine(d),
f = Expression.Evaluate(e)
][f], type number)
in
Ad_CorrectValue
Thanks, I'll try this solution next time!
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
Wow! That was fast and helped enormously. Thanks!
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 6 | |
| 4 | |
| 3 | |
| 2 | |
| 2 |
| User | Count |
|---|---|
| 11 | |
| 10 | |
| 7 | |
| 7 | |
| 6 |