Reply
Neg
New Member
Partially syndicated - Outbound

Cleaning Data with inconsistent decimal separators

Hi there everyone,

I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:

 

1,234.50 kg

1.234,5Kg

12.o3 Kg

approx. 1,5 kg

0, 12 kg

1.234

1.23

1.234 + 4,56

 

I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.

 

I'd appreciate any ideas on how to deal with this.

Thanks!

1 ACCEPTED SOLUTION
ronrsnfld
Super User
Super User

Syndicated - Outbound

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

 

See if the transform is what you expect

 

Original

ronrsnfld_3-1718888560762.png

Transformed

ronrsnfld_4-1718888615010.png

 

 

View solution in original post

4 REPLIES 4
dufoq3
Super User
Super User

Syndicated - Outbound

Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...

 

Result

dufoq3_0-1718891681882.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each 
        [ chars = Text.ToList("+-,.oO"),
          a = Text.Select([Column1], {"0".."9"} & chars),
          b = Text.ToList(Text.Trim(a, chars)),
          c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
          d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else 
                [ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
                  d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
                ][d2],
          e = Text.Combine(d),
          f = Expression.Evaluate(e)
        ][f], type number)
in
    Ad_CorrectValue

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Syndicated - Outbound

Thanks, I'll try this solution next time!

ronrsnfld
Super User
Super User

Syndicated - Outbound

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

 

See if the transform is what you expect

 

Original

ronrsnfld_3-1718888560762.png

Transformed

ronrsnfld_4-1718888615010.png

 

 

Syndicated - Outbound

Wow! That was fast and helped enormously. Thanks!

avatar user

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)