Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredGet Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Learn more
in microsoft excel you can do a wild card find and replace for <*> and it will remove all the html text and tags. but i would like to have this step be a part of my power query steps.
I have looked at multiple posts on this fourm, and i guess i am not advanced enough to understand how to implement the solution correctly.
I have 2 columns i want to cleanse of the html tagging called "Description" and "Comments".
I believe i need to create a blank query and call it when creating a new column? but i dont seem to understand what query i need to to use and how to call it when createing the new column.
my data looks like this -
<html><body> <div align="left" style="min-height:9pt"><font face="Arial" color="#ff0000"><span style="font-size:8pt"><b><i><u>Part of CST original bug package that was discovered during initial testing of OOTB
And i want to clean it so it looks like -
Part of CST original bug package that was discovered during initial testing of OOTB
any ideas and suggestions is greatly appreciated. i have really enjoyed learning power query. i just wish it allowed a wild card in the find and replace function lol
Solved! Go to Solution.
Hi @ryangoderre,
If this is what you need,
Try the attached pbix.
It is not the most effecient way, maybe others can improve it but you can try.
Hope this helps.
hi @ryangoderre ,
the strings you need are always at the end of the tags?
if yes, please follow @Jakinta suggestion to get your desired result
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dY/BbsIwDIZfxerOlSomIWDisO0BijRujIObOKm1kHSJ0wmenna9ZJrwyZ/1+//t06n6zE3zrHq5uN+OFu6Cvi4My0DzCOjY+v3Mq/V3DvLiyEiBkOTqqBRc2Nc9se1ltx1KaZllghcwqP5svkZGV3qr4EIsFU/GNFM9ME0D+v/3zFF14hvtNg/P6UrgEvICB4wCwcD7xxFCZMseHXTZwoDqCy2B9Cjwgwk0JxVGiqRB58jeAnuW6TEQSjLzZNO2x7fqfL4D", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Changed Type", {{"Column1", each Text.AfterDelimiter(_, ">", {0, RelativePosition.FromEnd}), type text}})
in
#"Extracted Text After Delimiter"
Try this as sample.
If you have different scenarios, please include on your sample to help you.
Thank you.
So my text strings are scattered. See sample data below. I tried the extract but that only extracted the first HTML tag. which isnt what i want.
I bolded the text i want to extract.
Is the code you pasted something that would work to extract all this text? How do i implement it if so?
<html><body> <div align="left" style="min-height:9pt"><font face="Arial"><span style="font-size:8pt"><b>Defect Description:</b></span><span style="font-size:8pt">
When running the Cube reports-
</span><span style="font-size:8pt"><b>Steps to Reproduce:</b></span></font></div> <ol style="margin-top:0mm;margin-bottom:0mm"> <li style="margin-right:0pt;padding-left:0pt;text-indent:5pt;font-family:'arial';color:#010101"><font face="arial"><span style="font-size:8pt;font-family:'arial';color:#010101;font-weight:normal;font-style:normal">Login to the application as Level 9 or Level 6 user.</span></font></li> <li style="margin-right:0pt;padding-left:0pt;text-indent:5pt;font-family:'arial';color:#010101"><font face="arial"><span style="font-size:8pt;font-family:'arial';color:#010101;font-weight:normal;font-style:normal">Navigate to the Public
Hi @ryangoderre,
If this is what you need,
Try the attached pbix.
It is not the most effecient way, maybe others can improve it but you can try.
Hope this helps.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7VNNT8JAEP0rE7xaqRhBSjwYORJj9OABOWzbabvJfrk7RfHX222LKakYCRxNL/tmZ+fNzHtdLgevZRheJQVJUZ+wwbFONw2GJpDyNTDBc3Xr8Wj8VmqaCcyoA8HRRmA3QXIVFMjzgqKp6aZ2uTKtCDKW7Ly8s5yJPQ+cYarP5csEjn9idLOXKm7AHDNMCOboEssNca2i5n4Yd7OHnuhoYvVSoAJbKsVVDlQg3JcxgkWjLblgsDrfinAavu6gz4TGAWl4QmN1Wib4t0GHnmMnUOm/4wctflCb2bwSnLSJQim3MNZEWvpIr9e2luB7a9naO6GhmWFpWi0w8J6rA4QfFHCVoqLo2rRryZjkYlMPOZqwbw9NZlXD2kZn4aX/DjEiO86Ih7TV5L43/4vSVjLRFvMUbaTXykJXi/Iae2sxYwRPmLc0MAcLXKOAKWjbHsdQOrQXv5mur73g/3KdTK4HtuY5I9wq9ljGlWKD1eoL", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
TextSplit = Table.AddColumn(Source, "Custom", each List.Select(List.Transform(Text.Split([Column1],">"),Text.Trim), each not Text.StartsWith( _, "<") )),
Expanded = Table.ExpandListColumn(TextSplit, "Custom"),
FINAL = Table.ReplaceValue(Expanded,":</b","",Replacer.ReplaceText,{"Custom"})
in
FINAL
Start:
Result:
thanks for posting your code and the screenshots it is really helpful to learn and understand.
My sample data is all inside of 1 cell. you broke it out into multiple cells in column 1, and then removed the html tags in column 2/custom.
will this code work if all my text inside a single cell? also not all my text is bolded, i was just doing that in the forum to highlight the text i wanted to extract out.
Sure, it will work. Just give it a try.
sorry for all the back and forth. i really do appreciate the help.
ok so how do i implement the code? do i need to go to custom column, and paste in the code you provided? I just tried that and it doesnt work in extracting the info in my cells, it extracted similar to the screenshot you sent. The yellow is the text i want extracted. the red is names i didnt want to post to the forum. you can see in the custom column it has defect description, but that isnt existent my cells, it was in your cells.
Sorry for all the back and forth.
You can go on Transform tab > Exctract and use Text.AfterDelimiter function.
Delimiter: ">".
In the Advanced options pick "From the end of the input".
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Power BI update to learn about new features.