Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Learn more

Reply
ryangoderre
Regular Visitor

Looking for help to remove text between html tags, like this <*>

in microsoft excel you can do a wild card find and replace for <*> and it will remove all the html text and tags.  but i would like to have this step be a part of my power query steps.  

 

I have looked at multiple posts on this fourm, and i guess i am not advanced enough to understand how to implement the solution correctly.  

 

I have 2 columns i want to cleanse of the html tagging called "Description" and "Comments". 

 

I believe i need to create a blank query and call it when creating a new column? but i dont seem to understand what query i need to to use and how to call it when createing the new column. 

 

my data looks like this - 

<html><body> <div align=&quot;left&quot; style=&quot;min-height:9pt&quot;><font face=&quot;Arial&quot; color=&quot;#ff0000&quot;><span style=&quot;font-size:8pt&quot;><b><i><u>Part of CST original bug package that was discovered during initial testing of OOTB

 

And i want to clean it so it looks like - 

Part of CST original bug package that was discovered during initial testing of OOTB

 

any ideas and suggestions is greatly appreciated.  i have really enjoyed learning power query.  i just wish it allowed a wild card in the find and replace function lol

1 ACCEPTED SOLUTION

Hi @ryangoderre,

 

If this is what you need,

Try the attached pbix.

It is not the most effecient way, maybe others can improve it but you can try.

2021_07_27_10_34_41_Untitled_Power_Query_Editor.png

Hope this helps.

 

 

View solution in original post

9 REPLIES 9
mussaenda
Super User
Super User

hi @ryangoderre ,

 

the strings you need are always at the end of the tags?

if yes, please follow @Jakinta suggestion to get your desired result

2021_07_25_14_50_11_Text_After_Delimiter.png 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dY/BbsIwDIZfxerOlSomIWDisO0BijRujIObOKm1kHSJ0wmenna9ZJrwyZ/1+//t06n6zE3zrHq5uN+OFu6Cvi4My0DzCOjY+v3Mq/V3DvLiyEiBkOTqqBRc2Nc9se1ltx1KaZllghcwqP5svkZGV3qr4EIsFU/GNFM9ME0D+v/3zFF14hvtNg/P6UrgEvICB4wCwcD7xxFCZMseHXTZwoDqCy2B9Cjwgwk0JxVGiqRB58jeAnuW6TEQSjLzZNO2x7fqfL4D", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Extracted Text After Delimiter" = Table.TransformColumns(#"Changed Type", {{"Column1", each Text.AfterDelimiter(_, ">", {0, RelativePosition.FromEnd}), type text}})
in
    #"Extracted Text After Delimiter"

 

Try this as sample.

If you have different scenarios, please include on your sample to help you.

 

Thank you.

So my text strings are scattered.  See sample data below.  I tried the extract but that only extracted the first HTML tag.  which isnt what i want.  

 

I bolded the text i want to extract.  

 

Is the code you pasted something that would work to extract all this text? How do i implement it if so? 

 

<html><body> <div align=&quot;left&quot; style=&quot;min-height:9pt&quot;><font face=&quot;Arial&quot;><span style=&quot;font-size:8pt&quot;><b>Defect Description:</b></span><span style=&quot;font-size:8pt&quot;>
When running the Cube reports-

</span><span style=&quot;font-size:8pt&quot;><b>Steps to Reproduce:</b></span></font></div> <ol style=&quot;margin-top:0mm;margin-bottom:0mm&quot;> <li style=&quot;margin-right:0pt;padding-left:0pt;text-indent:5pt;font-family:'arial';color:#010101&quot;><font face=&quot;arial&quot;><span style=&quot;font-size:8pt;font-family:'arial';color:#010101;font-weight:normal;font-style:normal&quot;>Login to the application as Level 9 or Level 6 user.</span></font></li> <li style=&quot;margin-right:0pt;padding-left:0pt;text-indent:5pt;font-family:'arial';color:#010101&quot;><font face=&quot;arial&quot;><span style=&quot;font-size:8pt;font-family:'arial';color:#010101;font-weight:normal;font-style:normal&quot;>Navigate to the Public

 

 

Hi @ryangoderre,

 

If this is what you need,

Try the attached pbix.

It is not the most effecient way, maybe others can improve it but you can try.

2021_07_27_10_34_41_Untitled_Power_Query_Editor.png

Hope this helps.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7VNNT8JAEP0rE7xaqRhBSjwYORJj9OABOWzbabvJfrk7RfHX222LKakYCRxNL/tmZ+fNzHtdLgevZRheJQVJUZ+wwbFONw2GJpDyNTDBc3Xr8Wj8VmqaCcyoA8HRRmA3QXIVFMjzgqKp6aZ2uTKtCDKW7Ly8s5yJPQ+cYarP5csEjn9idLOXKm7AHDNMCOboEssNca2i5n4Yd7OHnuhoYvVSoAJbKsVVDlQg3JcxgkWjLblgsDrfinAavu6gz4TGAWl4QmN1Wib4t0GHnmMnUOm/4wctflCb2bwSnLSJQim3MNZEWvpIr9e2luB7a9naO6GhmWFpWi0w8J6rA4QfFHCVoqLo2rRryZjkYlMPOZqwbw9NZlXD2kZn4aX/DjEiO86Ih7TV5L43/4vSVjLRFvMUbaTXykJXi/Iae2sxYwRPmLc0MAcLXKOAKWjbHsdQOrQXv5mur73g/3KdTK4HtuY5I9wq9ljGlWKD1eoL", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    TextSplit = Table.AddColumn(Source, "Custom", each List.Select(List.Transform(Text.Split([Column1],">"),Text.Trim), each not Text.StartsWith( _,  "<") )),
    Expanded = Table.ExpandListColumn(TextSplit, "Custom"),
    FINAL = Table.ReplaceValue(Expanded,":</b","",Replacer.ReplaceText,{"Custom"})
in
    FINAL

 

Start:

Jakinta_1-1627308700563.png

 

Result:

Jakinta_0-1627308674769.png

 

thanks for posting your code and the screenshots it is really helpful to learn and understand.  

 

My sample data is all inside of 1 cell.  you broke it out into multiple cells in column 1, and then removed the html tags in column 2/custom.  

 

will this code work if all my text inside a single cell? also not all my text is bolded, i was just doing that in the forum to highlight the text i wanted to extract out.  

Sure, it will work. Just give it a try.

sorry for all the back and forth.  i really do appreciate the help. 

ok so how do i implement the code?  do i need to go to custom column, and paste in the code you provided?  I just tried that and it doesnt work in extracting the info in my cells, it extracted similar to the screenshot you sent. The yellow is the text i want extracted.  the red is names i didnt want to post to the forum.  you can see in the custom column it has defect description, but that isnt existent my cells, it was in your cells.  

 

Sorry for all the back and forth.  

 

ryangoderre_0-1627320359766.png

 

Jakinta
Solution Sage
Solution Sage

You can go on Transform tab > Exctract and use Text.AfterDelimiter function.

Delimiter: ">".

In the Advanced options pick "From the end of the input".

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Kudoed Authors