The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi there everyone,
I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:
1,234.50 kg
1.234,5Kg
12.o3 Kg
approx. 1,5 kg
0, 12 kg
1.234
1.23
1.234 + 4,56
I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.
I'd appreciate any ideas on how to deal with this.
Thanks!
Solved! Go to Solution.
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...
Result
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each
[ chars = Text.ToList("+-,.oO"),
a = Text.Select([Column1], {"0".."9"} & chars),
b = Text.ToList(Text.Trim(a, chars)),
c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else
[ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
][d2],
e = Text.Combine(d),
f = Expression.Evaluate(e)
][f], type number)
in
Ad_CorrectValue
Thanks, I'll try this solution next time!
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
Wow! That was fast and helped enormously. Thanks!
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.