Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Removing HTML Tags and extracting text

Hey guys, 

 

I need help figuring out how to remove/clean HTML data. I need to just extract the text. I'll copy a sample below for your information. 

 

Thanks in advanced. 

 

p
<div class="ExternalClass5796EA3CBFB24C2DA402D911F488833D"></p><p>Document Agresso Pipeline 2019-Q2 PID.&#160;</p><p><br>&#160;</p><p>Sue scheduled to work Monday next week and perhaps a couple of other days (tbc).?</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP127 - LATCo</strong>&#160;&#160;<br style="margin&#58;0px;line-height&#58;20.8px;"></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#5...
<div class="ExternalClassC964E2629C2C4CD7B480E1C98C6DE34C"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Prepare Project Board slides in readiness for meeting to be held on 20/03/2019.<br></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Arrange Sue's work schedule.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line...
<div class="ExternalClassE55DE38454154EFEB55A531968AB8AC7"><p></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">ICT Team will be conducting Year-End activities next week so the time available to be spent on Agresso Pipeline activities will be limited.&#160;<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Session arranged with Phil to handover all BA related&#160;Agresso Pipeline activities and to obtain a status update.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;"><br></strong></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP127 - LATCo</strong>&#160;&#160;<br style="margin&#58;0px;line-height&#58;20.8px;"></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Meeting arranged for&#160;26/03/2019.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Review a...
<div class="ExternalClassDB3FBF1C1A9F4B8CA432B1895A7EA234"><p>Prepare to start writing an EOP report and associated slides.</p>?????<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><br></p>?<br></p></div>
<div class="ExternalClassF8A3A37CEA9346639AA50BB961BDB29E">Once CJI3 available complete EOP slides.</p><p><span style="font&#58;400 13px/20.8px &quot;segoe ui&quot;,&quot;segoe&quot;,tahoma,helvetica,arial,sans-serif;text-align&#58;left;color&#58;#444444;text-transform&#58;none;text-indent&#58;0px;letter-spacing&#58;normal;text-decoration&#58;none;word-spacing&#58;0px;display&#58;inline !important;white-space&#58;normal;orphans&#58;2;font-size-adjust&#58;none;font-stretch&#58;normal;float&#58;none;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied.</span><span style="font&#58;400 13px/20.8px &quot;segoe ui&quot;,&quot;segoe&quot;,tahoma,helvetica,arial,sans-serif;text-align&#58;left;color&#58;#444444;text-transform&#58;none;text-indent&#58;0px;letter-spacing&#58;normal;text-decoration&#58;none;word-spacing&#58;0px;display&#58;inline !important;white-space&#58;normal;orphans&#58;2;font-size-adjust&#58;none;font-stretch&#58;normal;float&#58;none;background-color&#58...
<div class="ExternalClass53ABD8C10790435FA20EA6B3C0658E1B"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><span style="text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#...
<div class="ExternalClass18D86FBC3818425FB569E604DBA97FA9"><p></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><span style="line-height&#58;20.8px;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied?</span></p><br></p></div>
<div class="ExternalClass088C5894BE554D8BBFEC053C5FE81D55"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><span style="line-height&#58;20.8px;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied??</span></p><br>&#160;</p></div>
<div class="ExternalClass9E6533746EAB4AF2BF3FB94E6A65AFED"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><span style="line-height&#58;20.8px;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied???</span></p><br>&#160;</p></div>
<div class="ExternalClassCA561E6DB1F5409D857A1841B962611B">CR3 documented and submitted for approval. This CR requests that the implementation of CA client is restarted following the same deployment method as for LATCo.</p><p>CR2 will need to be revised as the licensing costs for CA will change due to a reduced number of client licences being required (6 as opposed to 20).</p><p>Sue not worked in PCC this week due to other commitments.?</p><p>First PCC/Lincs Agresso Forum meeting held on 05/03/2019, at which Agresso experiences were shared between PCC and LIns.&#160;<br></p><p><br>&#160;</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">...
<div class="ExternalClassDBE9541976CE4649B188607B0788AE75"><p> <span style="font&#58;400 13px/20.8px &quot;segoe ui&quot;, segoe, tahoma, helvetica, arial, sans-serif;text-align&#58;left;color&#58;#444444;text-transform&#58;none;text-indent&#58;0px;letter-spacing&#58;normal;text-decoration&#58;none;word-spacing&#58;0px;display&#58;inline;white-space&#58;normal;orphans&#58;2;float&#58;none;background-color&#58;transparent;">Agresso Pipeline 2019-Q2 PID submitted and approved.&#160;</span> </p><p>Awaiting approval of CR3.</p><p>Awaiting license costs for CR2, unable to revise CR until license costs provided.</p><p>Sue has had problems logging onto PCC network remotely. She has continued to undertake contract related pipeline work for Richard McCarthy.<br></p><p><br>&#160;</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-we...
<div class="ExternalClass7B68D75BF86F4A86837989FCAFCF3799"><p><span style="font-family&#58;&quot;segoe ui&quot;, segoe, tahoma, helvetica, arial, sans-serif;">Agresso Pipeline 2019-Q2 PID revised,&#160;submitted and approved.&#160;</span><br></p><p>CR3 approved.<br></p><p>Amended costs for CA licences added to CR2 and revised CR submitted to PMO.<br></p><p>Financial&#160;Year-End activities commence next week so the time spent on the&#160;Agresso Pipeline by the ICT Team will be limited.<br></p><p>Project Board meeting held on 20/03/2019. <br></p><p>Timesheet for Sue's time on spent on&#160;28/02 approved.<br></p><p></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP99 - Combined Authority&#160;</strong><br style="margin&#58;0px;line-height&#58;20.8px;"></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><span style="margin&#58;0px;line-height&#58;20.8px;">Completed enough of&#160;the configuration of the&#160...
<div class="ExternalClass217D5FF855074455A8242C61FC5C1EB8"><p><span style="font-family&#58;&quot;segoe ui&quot;, segoe, tahoma, helvetica, arial, sans-serif;">Agresso Pipeline 2019-Q2 PID</span> approved&#160;by PCC. <br></p><p>Agresso ICT Team undertook financial year-end activities which have restricted the amount of time available to spend on the project.<br></p><p>BA handover meeting held with Phil.<br></p><p>Amended&#160;CR2 presented and approved at Gate meeting on 29/03/2019.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP99 - Combined Authority&#160;</strong> <br style="margin&#58;0px;line-height&#58;20.8px;">CR 2 approved at gate&#160;meeting.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Initial testing of system&#160;taking place.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Meeting arranged on 01/04/2019 to discuss testing approach and monitoring requir...
2 ACCEPTED SOLUTIONS
Fowmy
Super User
Super User

@Anonymous 

Pasted your sample HTML data in PQ using ENTER DATA table option, you can import from your HTML file as Web source.
So the data looks like this

Fowmy_0-1597128347221.png


The Added a custom column with following code

 

=Html.Table([Column1], {{"ExtractedText",":root"}})

 


Then Expanded the New Column, you get only the text

Fowmy_1-1597128474182.png


If you want to combine al the above lines into one CELL, add the following line:

=Text.Combine(#"Expanded Custom"[ExtractedText],"#(lf)")

Fowmy_0-1597128910739.png

 






________________________

Did I answer your question? Mark this post as a solution, this will help others!.

Click on the Thumbs-Up icon on the right if you like this reply 🙂

YouTube, LinkedIn

 

 

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

View solution in original post

Anonymous
Not applicable

@Fowmy That's awesome, thanks for the advice. 

 

I've also discovered the custom Visual HTML Content, which displays HTML Rich text in their natural/intended form. Also works really nicely. 

Karlos_0-1597139333548.png

 

View solution in original post

3 REPLIES 3
Fowmy
Super User
Super User

@Anonymous 

Pasted your sample HTML data in PQ using ENTER DATA table option, you can import from your HTML file as Web source.
So the data looks like this

Fowmy_0-1597128347221.png


The Added a custom column with following code

 

=Html.Table([Column1], {{"ExtractedText",":root"}})

 


Then Expanded the New Column, you get only the text

Fowmy_1-1597128474182.png


If you want to combine al the above lines into one CELL, add the following line:

=Text.Combine(#"Expanded Custom"[ExtractedText],"#(lf)")

Fowmy_0-1597128910739.png

 






________________________

Did I answer your question? Mark this post as a solution, this will help others!.

Click on the Thumbs-Up icon on the right if you like this reply 🙂

YouTube, LinkedIn

 

 

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

Anonymous
Not applicable

@Fowmy That's awesome, thanks for the advice. 

 

I've also discovered the custom Visual HTML Content, which displays HTML Rich text in their natural/intended form. Also works really nicely. 

Karlos_0-1597139333548.png

 

edhans
Super User
Super User

Take a look at Chris's post on this. - Removing HTML tags.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors