Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
BZimber
New Member

Power Query Replace using Wildcards ie Replace in Excel <*> with space

I find plenty of posts on how to replace things in a certain way but not what I need to do. I have a column that is an extract from devops which, when I bring it into Excel, it has all the html commands in it, all of which start and end in '<' and '>' like <div> or </div><div><div style="box-sizing:border-box;"> etc.. all splattered throughout the cell. In excel it is easy, find <*> and replace with a space. How can I do this in power query. I am not an m coder so would need it dumbed down.

1 ACCEPTED SOLUTION

this 

let
    fx = (txt) => Text.Combine(
        List.Alternate(
            List.Select(Text.SplitAny(txt, "<>"), (x) => x <> ""), 
            1, 
            1, 
            Number.From(not Text.StartsWith(txt, "<"))
        ),
        " "
    ),
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content], 
    result = Table.TransformColumns(Source, {"Column1", fx})
in
    result

replaces <, > and everything in between with space. If you don't need space - simply remove " " in Text.Combine call.  

View solution in original post

11 REPLIES 11
v-xinruzhu-msft
Community Support
Community Support

Hi,

Thanks for the solutions everyone offered, and i want to offer some more information for user to refer to.

hello @BZimber , you can create a blank query, and put the following code to advanced editor in power query.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wiik1MDBOTsksAzNSE5OSISL6cKGU1DSlWB2YSoQ4mk5DI2MTUzO4oEJxSWVOqm2MUlJ+hW5xZlVmXrpVUn5RSmqRLlDEOgZiXKpSbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each let a=Text.Split([Column1],">"),
b=List.Transform(a,each if Text.Contains(_,"<") then Text.BeforeDelimiter(_,"<") else _),
c=Text.Combine(b," ")
in c),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom",{{"Custom", Text.Trim, type text}})
in
    #"Trimmed Text"

Output

vxinruzhumsft_0-1730872045778.png

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Omid_Motamedise
Memorable Member
Memorable Member

The Text.BetweenDelimiters might be helpfull 

It might very well be helpful. However I am a new user and I don't have the first clue on what to do or how to use that function as I am not to the level of coding. I am learning via videos using the built in function buttons that create code. I am not to the level yet to speak 'm' code.

imagine your data are on a coluumn namely X, then add a new custom column namely X2 by the next formula to extract the text between <>.

 

=Text.BetweenDelimiter([X],"<",">")

then replace it by your desired text by using Text.Replace

ronrsnfld
Super User
Super User

You could use regular expressions.

 

Create a blank query

Paste the code below into the Advanced Editor.

 

Then create your custom column using "invoke custom function"

ronrsnfld_0-1730424500506.png

 

In the Advanced Editor, the code would look like:

 

#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each fnRegexReplace([Column1],"<[^>]*>"," "))

I made several attempts but failed each time. I am new to power query so when providing concepts like that I fall flat on my face becuase I don't have the experience to intuitively fill int he blanks. I went to the query editor then New Source>Other Sources>Blank Query. Then to Advanced Editor which gave me: 

let
Source = ""
in
Source

 

Then I pasted in your string after 'Source = ""' and had to put a comma at the end of 'Source = ""' to resolve a token error thing. It then looked like:

let
Source = "",
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each fnRegexReplace([Column1],"<[^>]*>"," "))
in
Source

Then I went to Add Column but all the buttons were greyed out.

 

So by now you probably can tell that I like the idea, but I am in over my head. But the question becomes, even if I go the function to work in the blank query, how do I get it to work with my data in the original query?

Not your fault at all.

I neglected to provide the code you needed to paste into the blank query. 

 

🤦

//Rename Query:  fnRegexReplace

(text as nullable text,pattern as nullable text,replace as nullable text, optional flags as nullable text) =>
    let
        f=if flags = null or flags ="" then "" else flags,
        l1 = List.Transform({text, pattern, replace}, each Text.Replace(_, "\", "\")),
        l2 = List.Transform(l1, each Text.Replace(_, "'", "\'")),
        t = Text.Format("<script>var txt='#{0}';document.write(txt.replace(new RegExp('#{1}','#{3}'),'#{2}'));</script>", List.Combine({l2,{f}})),
        r=Web.Page(t)[Data]{0}[Children]{0}[Children],
        Output=if List.Count(r)>1 then r{1}[Text]{0} else ""
    in Output

 

 

 

After you do this, you should be able to add that column in your main query.

 

Sorry about that.

This solution mostly worked. It seems like so long as the text in the field started with '<' then it worked. But if there was text before the first occurance, it just wiped out all text.

tackytechtom
Super User
Super User

Hi @BZimber ,


Have you already looked through this post by @Rickmaurinus, yet? 

https://gorilla.bi/power-query/wildcards/

 

Also,  the following approach might be helping you out here as well:
https://www.tackytech.blog/how-to-replace-several-characters-in-a-column-in-just-one-power-query-ste...

 

You would still need to specify each of those funky html tags, though... 

 

Hope this helps 🙂

 

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/

 

 



Did I answer your question➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn! linkedIn

#proudtobeasuperuser 

Yes I saw the first one and couldn't figure out how to adapt it. I am not trying to clean 'pre-determined' values. I am trying to turn everything between '<' and '>' to a blank. For example <what ever html code here> to a space. There will be multiple of those html instructions in each cell. I am a very new to power query and everyone giving ideas is operating from a much higher experience level than myself which is why I cannot just apply general concepts to my case.

this 

let
    fx = (txt) => Text.Combine(
        List.Alternate(
            List.Select(Text.SplitAny(txt, "<>"), (x) => x <> ""), 
            1, 
            1, 
            Number.From(not Text.StartsWith(txt, "<"))
        ),
        " "
    ),
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content], 
    result = Table.TransformColumns(Source, {"Column1", fx})
in
    result

replaces <, > and everything in between with space. If you don't need space - simply remove " " in Text.Combine call.  

Helpful resources

Announcements
OCT PBI Update Carousel

Power BI Monthly Update - October 2024

Check out the October 2024 Power BI update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.

Users online (2,325)