Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
ngct1112
Post Patron
Post Patron

Regex para restar textos por Power Query/ Dax

Hola, me gustaría extraer algunos datos de una columna que la entrada de los datos no son consosos.

Puedo saber si hay alguna manera de limpiar los datos por PowerQuery o DAX. Gracias

ColumnaResultado
1kg1kg
abc10kg10kg
10kg10kg
18kg hhh418kg
Abc:10kg10kg
5aaab3kg5kg
abc - 15kg15kg
3kg3kg
9kg9kg
VV - fg 8kg8kg
kk 5kg 00g5kg
001....abc.. 5kg...5kg
2 ACCEPTED SOLUTIONS

Hola @ngct1112 ,

por favor pegue este código en el editor avanzado y siga los pasos:

let
    Source = Table.FromRows(
        Json.Document(
            Binary.Decompress(
                Binary.FromText(
                    "i45Wcs7PKc3NU9JRCkotLs0pUYrViVYyzE4HCoBIEC8xKdnQACJiABXC4FtkpytkZGSYgAQtoIKOSclW6ApNE7PTExOTjMGipggLFHQVDE0hSmGiEDXGUJ4lmGcJ5YWFATWkpStYgEVhFmZnKwA1KxgYIBtuYGD4qGGZHtAOPT2gGJANk4wFAA==", 
                    BinaryEncoding.Base64
                ), 
                Compression.Deflate
            )
        ), 
        let
            _t = ((type nullable text) meta [Serialized.Text = true])
        in
            type table[Column1 = _t, Column2 = _t]
    ),
    #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
    #"Changed Type" = Table.TransformColumnTypes(
        #"Duplicated Column", 
        {{"Column1", type text}, {"Column2", type text}}
    ),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars = true]),
    #"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1),
    #"Changed Type1" = Table.TransformColumnTypes(
        #"Added Index", 
        {{"Column", type text}, {"Result", type text}}
    ),
    #"Split Column by Character Transition" = Table.SplitColumn(
        #"Changed Type1", 
        "Column", 
        Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), 
        {"Column.1", "Column.2", "Column.3"}
    ),
    GetSplittedValues = Table.AddColumn(
        #"Split Column by Character Transition", 
        "SplittedValues", 
        each Record.FieldValues(
            Record.SelectFields(
                _, 
                List.Difference(Record.FieldNames(_), Table.ColumnNames(#"Changed Type1"))
            )
        )
    ),
    #"Expanded SplittedValues" = Table.ExpandListColumn(GetSplittedValues, "SplittedValues"),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(
        #"Expanded SplittedValues", 
        {{"SplittedValues", each Text.BeforeDelimiter(_, " "), type text}}
    ),
    FilterOnlyRowsWithNumbers = Table.SelectRows(
        #"Extracted Text Before Delimiter", 
        each (List.Contains({"1".."9"}, Text.Start([SplittedValues], 1)))
    ),
    FilterOnlyRowsWithKg = Table.SelectRows(
        FilterOnlyRowsWithNumbers, 
        each Text.Contains([SplittedValues], "kg", Comparer.OrdinalIgnoreCase)
    ),
    #"Added Custom1" = Table.AddColumn(
        FilterOnlyRowsWithKg, 
        "Custom", 
        each try Number.From(Text.BeforeDelimiter([SplittedValues], "kg")) otherwise null
    ),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] <> null)),
    #"Added Suffix" = Table.TransformColumns(
        #"Filtered Rows", 
        {{"Custom", each Text.From(_, "en-GB") & "kg", type text}}
    ),
    #"Removed Other Columns" = Table.SelectColumns(
        #"Added Suffix", 
        {"Column_1", "Result", "Custom"}
    )
in
    #"Removed Other Columns"

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

Syndicate_Admin
Administrator
Administrator

el poder real de PQ + RegEx

let
    RE = (regex as text, str as text) =>
    let
        html =
            "<script>var regex = " & regex & "; var str = """ & str & """; var res = str.match(regex); document.write(res)</script>",
        res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
    in res,

    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMsxOV4rViVZKTEo2NICyDQ28oQyL7HSFjIwMEzDPMSnZCq7GNDE7PTExyRihXUFXwdAUyjX2dgfTltkQOiwMKJuWrmABlc/OVgAqVTAwgHANDAwfNSzTAxqipwcUB7KVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each RE("/\d+kg/gi", [Column]))
in
    #"Added Custom"

Screenshot 2021-07-04 221030.png

View solution in original post

10 REPLIES 10
Syndicate_Admin
Administrator
Administrator

el poder real de PQ + RegEx

let
    RE = (regex as text, str as text) =>
    let
        html =
            "<script>var regex = " & regex & "; var str = """ & str & """; var res = str.match(regex); document.write(res)</script>",
        res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
    in res,

    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMsxOV4rViVZKTEo2NICyDQ28oQyL7HSFjIwMEzDPMSnZCq7GNDE7PTExyRihXUFXwdAUyjX2dgfTltkQOiwMKJuWrmABlc/OVgAqVTAwgHANDAwfNSzTAxqipwcUB7KVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each RE("/\d+kg/gi", [Column]))
in
    #"Added Custom"

Screenshot 2021-07-04 221030.png

@CNENFRNL funciona pero muy súper mega lento, y nunca completo para filas de 1 mil

Recibo el siguiente error mientras intento editar la fórmula regex

Estoy tratando de comprender lo que has hecho aquí. Pero fallando.

Exporté el código, edité la tabla de origen y cambié la fórmula regex para que coincida con mi requisito. pero obteniendo un error:

= Table.AddColumn(#"Renamed Columns", "Custom", each RE("/\w+", [Idealink]))

Expression.Error: No había suficientes elementos en la enumeración para completar la operación.

No @CNENFRNL,

esto es realmente awesomesauce !!

Ahora solo nos queda esperar y esperar que esto también se pueda refrescar en el servicio algún día.
Parece que también está prohibido a través de una puerta de enlace, por desgracia.

Saludos, Imke

Gracias por su amable recordatorio. De hecho, debo tener en cuenta tales restricciones, ya que ahora soy adicto a una forma tan complicada de manejar regex en PQ en lugar de incrustar scripts de R o Python. 😂

Por cierto, te tengo un gran agradecimiento, ya que estoy iluminado por muchos de sus blogs en PQ! 👍

@CNENFRNL Este método se ve tan brillante.

¿Puedo preguntar cómo funciona? utilizando el parámetro para crear regrex?

Es un poco complicado. Usé Web.Page para analizar html que contiene script js; como puede ver, el objeto regex incrustado se puede usar en dichos fragmentos de código js. Funciona en su scenairo, pero no es una panacea. Pls se refieren a las restricciones en la respuesta de ImkeF.

amitchandak
Super User
Super User

@ImkeF , puede ayudar en esto ?

Full Power BI Video 20 Hours YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube

Hola @ngct1112 ,

por favor pegue este código en el editor avanzado y siga los pasos:

let
    Source = Table.FromRows(
        Json.Document(
            Binary.Decompress(
                Binary.FromText(
                    "i45Wcs7PKc3NU9JRCkotLs0pUYrViVYyzE4HCoBIEC8xKdnQACJiABXC4FtkpytkZGSYgAQtoIKOSclW6ApNE7PTExOTjMGipggLFHQVDE0hSmGiEDXGUJ4lmGcJ5YWFATWkpStYgEVhFmZnKwA1KxgYIBtuYGD4qGGZHtAOPT2gGJANk4wFAA==", 
                    BinaryEncoding.Base64
                ), 
                Compression.Deflate
            )
        ), 
        let
            _t = ((type nullable text) meta [Serialized.Text = true])
        in
            type table[Column1 = _t, Column2 = _t]
    ),
    #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
    #"Changed Type" = Table.TransformColumnTypes(
        #"Duplicated Column", 
        {{"Column1", type text}, {"Column2", type text}}
    ),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars = true]),
    #"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1),
    #"Changed Type1" = Table.TransformColumnTypes(
        #"Added Index", 
        {{"Column", type text}, {"Result", type text}}
    ),
    #"Split Column by Character Transition" = Table.SplitColumn(
        #"Changed Type1", 
        "Column", 
        Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), 
        {"Column.1", "Column.2", "Column.3"}
    ),
    GetSplittedValues = Table.AddColumn(
        #"Split Column by Character Transition", 
        "SplittedValues", 
        each Record.FieldValues(
            Record.SelectFields(
                _, 
                List.Difference(Record.FieldNames(_), Table.ColumnNames(#"Changed Type1"))
            )
        )
    ),
    #"Expanded SplittedValues" = Table.ExpandListColumn(GetSplittedValues, "SplittedValues"),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(
        #"Expanded SplittedValues", 
        {{"SplittedValues", each Text.BeforeDelimiter(_, " "), type text}}
    ),
    FilterOnlyRowsWithNumbers = Table.SelectRows(
        #"Extracted Text Before Delimiter", 
        each (List.Contains({"1".."9"}, Text.Start([SplittedValues], 1)))
    ),
    FilterOnlyRowsWithKg = Table.SelectRows(
        FilterOnlyRowsWithNumbers, 
        each Text.Contains([SplittedValues], "kg", Comparer.OrdinalIgnoreCase)
    ),
    #"Added Custom1" = Table.AddColumn(
        FilterOnlyRowsWithKg, 
        "Custom", 
        each try Number.From(Text.BeforeDelimiter([SplittedValues], "kg")) otherwise null
    ),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] <> null)),
    #"Added Suffix" = Table.TransformColumns(
        #"Filtered Rows", 
        {{"Custom", each Text.From(_, "en-GB") & "kg", type text}}
    ),
    #"Removed Other Columns" = Table.SelectColumns(
        #"Added Suffix", 
        {"Column_1", "Result", "Custom"}
    )
in
    #"Removed Other Columns"

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@ImkeF la forma en que haces esto es increíble. Apreciado con su ayuda.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.