- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cleaning Data with inconsistent decimal separators
Hi there everyone,
I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:
1,234.50 kg
1.234,5Kg
12.o3 Kg
approx. 1,5 kg
0, 12 kg
1.234
1.23
1.234 + 4,56
I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.
I'd appreciate any ideas on how to deal with this.
Thanks!
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...
Result
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each
[ chars = Text.ToList("+-,.oO"),
a = Text.Select([Column1], {"0".."9"} & chars),
b = Text.ToList(Text.Trim(a, chars)),
c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else
[ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
][d2],
e = Text.Combine(d),
f = Expression.Evaluate(e)
][f], type number)
in
Ad_CorrectValue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I'll try this solution next time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Removed Blank Rows" = Table.SelectRows(#"Changed Type",
each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
//Clean the data
#"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
{"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
#"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
#"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),
//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
#"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each
let
#"Split Plus" = Text.Split([Column1],"+"),
#"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
#"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),
#"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
else null,
#"Convert" = if #"Decimal" = "."
then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
in
List.Sum(#"Convert"), type number)
in
#"Add Converted"
See if the transform is what you expect
Original
Transformed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow! That was fast and helped enormously. Thanks!
Helpful resources
Join us at the Microsoft Fabric Community Conference
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Microsoft Fabric Community Conference 2025
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
Subject | Author | Posted | |
---|---|---|---|
06-17-2024 06:37 AM | |||
07-26-2023 11:06 PM | |||
07-08-2024 10:27 PM | |||
10-24-2023 08:54 PM | |||
08-03-2020 06:02 PM |