Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Neg
New Member

Cleaning Data with inconsistent decimal separators

Hi there everyone,

I am trying to process a set of data that contains the gross weight of shipments, and its driving me nuts, due to how bad and inconsistent the data is. I get values such as below:

 

1,234.50 kg

1.234,5Kg

12.o3 Kg

approx. 1,5 kg

0, 12 kg

1.234

1.23

1.234 + 4,56

 

I manage to trim all the spaces and text just fine. What I can't fix is how to deal with the different decimal and thousands separators. The only consistent way to tell from the data is whether there are 2 or 3 digits after the separator.

 

I'd appreciate any ideas on how to deal with this.

Thanks!

1 ACCEPTED SOLUTION
ronrsnfld
Super User
Super User

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

 

See if the transform is what you expect

 

Original

ronrsnfld_3-1718888560762.png

Transformed

ronrsnfld_4-1718888615010.png

 

 

View solution in original post

4 REPLIES 4
dufoq3
Super User
Super User

Hi @Neg, different approach here. Jut keep in mind that it works with sample data - but in real world there can be other cases...

 

Result

dufoq3_0-1718891681882.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitUB8vWAfB1TbyjPSC/fWAHKSSwoKMqv0FMw1DGFqTbQUTA0QtEKZyGEFLQVgCaaKcXGAgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Ad_CorrectValue = Table.AddColumn(Source, "CorrectValue", each 
        [ chars = Text.ToList("+-,.oO"),
          a = Text.Select([Column1], {"0".."9"} & chars),
          b = Text.ToList(Text.Trim(a, chars)),
          c = List.ReplaceMatchingItems(b, {{"o", "0"}, {"O", "0"}}),
          d = if List.Contains(c, {"+", "-"}) then c else if List.PositionOf(c, ",") < List.PositionOf(c, ".") then List.RemoveMatchingItems(c, {","}) else 
                [ d1 = List.ReplaceMatchingItems(c, {{",", "."}}),
                  d2 = if List.Count(List.Select(d1, (x)=> x = ".")) > 1 then List.Transform(List.Difference(List.Positions(d1), {List.PositionOf(d1, ".", Occurrence.First)}), (y)=> d1{y}) else d1
                ][d2],
          e = Text.Combine(d),
          f = Expression.Evaluate(e)
        ][f], type number)
in
    Ad_CorrectValue

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Thanks, I'll try this solution next time!

ronrsnfld
Super User
Super User

This may work, if your differentiation is that a token followed by two (or one) digits is the decimal token:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMtQxMjbRMzVQyE5XitWJVgIThnpAQR1Tb2QhI718YwVkkcSCgqL8Cj0FQx1TFM0GOgqGRpjGQU2BMYBCUIYhQo2CtgLQWjOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Changed Type", 
        each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),

//Clean the data
    #"Remove Leading/Trailing text" = Table.TransformColumns(#"Removed Blank Rows",
        {"Column1", each Text.Trim(_,{" ",".","a".."z","A".."Z"}), type text}),
    #"'o' to '0'"=Table.ReplaceValue(#"Remove Leading/Trailing text","o","0",Replacer.ReplaceText,{"Column1"}),
    #"Remove Spaces" =Table.ReplaceValue(#"'o' to '0'"," ","",Replacer.ReplaceText,{"Column1"}),

//Convert the number based on algorithm that
// if there are one or two digits after the `dot` or `comma` then that token = decimal token
    #"Add Converted" = Table.AddColumn(#"Remove Spaces", "Converted Number", each 
      let 
        #"Split Plus" = Text.Split([Column1],"+"),
        #"Split Dot" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,"."))),
        #"Split Comma" = List.Combine(List.Transform(#"Split Plus", each Text.Split(_,","))),

        #"Decimal" = if (Text.Length(#"Split Dot"{1}?) <= 2)??false then "."
            else if ((Text.Length(#"Split Comma"{1}?)??0) <= 2)??false then ","
            else null,
       
        #"Convert" = if #"Decimal" = "."
          then List.Transform(#"Split Plus", each Number.From(_, "en-US"))
          else List.Transform(#"Split Plus", each Number.FromText(_,"en-DE"))
      in 
        List.Sum(#"Convert"), type number)
in
    #"Add Converted"

 

See if the transform is what you expect

 

Original

ronrsnfld_3-1718888560762.png

Transformed

ronrsnfld_4-1718888615010.png

 

 

Wow! That was fast and helped enormously. Thanks!

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors
Top Kudoed Authors