Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
Neg
New Member

Cleaning Data with inconsistent decimal separators

Hi there everyone,

I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:

 

1,234.50 kg

1.234,5Kg

12.o3 Kg

approx. 1,5 kg

0, 12 kg

1.234

1.23

1.234 + 4,56

 

I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.

 

I'd appreciate any ideas on how to deal with this.

Thanks!

1 ACCEPTED SOLUTION
ronrsnfld
Super User
Super User

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

 

See if the transform is what you expect

 

Original

ronrsnfld_3-1718888560762.png

Transformed

ronrsnfld_4-1718888615010.png

 

 

View solution in original post

4 REPLIES 4
dufoq3
Super User
Super User

Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...

 

Result

dufoq3_0-1718891681882.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each 
        [ chars = Text.ToList("+-,.oO"),
          a = Text.Select([Column1], {"0".."9"} & chars),
          b = Text.ToList(Text.Trim(a, chars)),
          c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
          d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else 
                [ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
                  d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
                ][d2],
          e = Text.Combine(d),
          f = Expression.Evaluate(e)
        ][f], type number)
in
    Ad_CorrectValue

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Thanks, I'll try this solution next time!

ronrsnfld
Super User
Super User

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

 

See if the transform is what you expect

 

Original

ronrsnfld_3-1718888560762.png

Transformed

ronrsnfld_4-1718888615010.png

 

 

Wow! That was fast and helped enormously. Thanks!

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors