Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hi there everyone,
I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:
1,234.50 kg
1.234,5Kg
12.o3 Kg
approx. 1,5 kg
0, 12 kg
1.234
1.23
1.234 + 4,56
I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.
I'd appreciate any ideas on how to deal with this.
Thanks!
Solved! Go to Solution.
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...
Result
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each
[ chars = Text.ToList("+-,.oO"),
a = Text.Select([Column1], {"0".."9"} & chars),
b = Text.ToList(Text.Trim(a, chars)),
c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else
[ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
][d2],
e = Text.Combine(d),
f = Expression.Evaluate(e)
][f], type number)
in
Ad_CorrectValue
Thanks, I'll try this solution next time!
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
Wow! That was fast and helped enormously. Thanks!
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
14 | |
13 | |
12 | |
12 | |
12 |