Solved: Re: Cleaning Data with inconsistent decimal separa...

Neg · ‎06-20-2024

Hi there everyone,

I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:

1,234.50 kg

1.234,5Kg

12.o3 Kg

approx. 1,5 kg

0, 12 kg

1.234

1.23

1.234 + 4,56

I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.

I'd appreciate any ideas on how to deal with this.

Thanks!

ronrsnfld · ‎06-20-2024

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

See if the transform is what you expect

Original

Transformed

View solution in original post

dufoq3 · ‎06-20-2024

Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...

Result

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each 
        [ chars = Text.ToList("+-,.oO"),
          a = Text.Select([Column1], {"0".."9"} & chars),
          b = Text.ToList(Text.Trim(a, chars)),
          c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
          d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else 
                [ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
                  d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
                ][d2],
          e = Text.Combine(d),
          f = Expression.Evaluate(e)
        ][f], type number)
in
    Ad_CorrectValue

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Neg · ‎06-20-2024

Thanks, I'll try this solution next time!

ronrsnfld · ‎06-20-2024

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

See if the transform is what you expect

Original

Transformed

Neg · ‎06-20-2024

Wow! That was fast and helped enormously. Thanks!

Cleaning Data with inconsistent decimal separators

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Party with Power BI’s own Guy in a Cube

Cleaning Data with inconsistent decimal separators

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025