Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hey guys,
I need help figuring out how to remove/clean HTML data. I need to just extract the text. I'll copy a sample below for your information.
Thanks in advanced.
p |
<div class="ExternalClass5796EA3CBFB24C2DA402D911F488833D"></p><p>Document Agresso Pipeline 2019-Q2 PID. </p><p><br> </p><p>Sue scheduled to work Monday next week and perhaps a couple of other days (tbc).?</p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;"><strong style="margin:0px;line-height:20.8px;">AP127 - LATCo</strong>  <br style="margin:0px;line-height:20.8px;"></p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing... |
<div class="ExternalClassC964E2629C2C4CD7B480E1C98C6DE34C"><p></p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;">Prepare Project Board slides in readiness for meeting to be held on 20/03/2019.<br></p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;">Arrange Sue's work schedule.</p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line... |
<div class="ExternalClassE55DE38454154EFEB55A531968AB8AC7"><p></p><p style="line-height:20.8px;background-color:transparent;">ICT Team will be conducting Year-End activities next week so the time available to be spent on Agresso Pipeline activities will be limited. <br></p><p style="line-height:20.8px;background-color:transparent;">Session arranged with Phil to handover all BA related Agresso Pipeline activities and to obtain a status update.<br></p><p style="line-height:20.8px;background-color:transparent;"><strong style="margin:0px;line-height:20.8px;"><br></strong></p><p style="line-height:20.8px;background-color:transparent;"><strong style="margin:0px;line-height:20.8px;">AP127 - LATCo</strong>  <br style="margin:0px;line-height:20.8px;"></p><p style="line-height:20.8px;background-color:transparent;">Meeting arranged for 26/03/2019.<br></p><p style="line-height:20.8px;background-color:transparent;">Review a... |
<div class="ExternalClassDB3FBF1C1A9F4B8CA432B1895A7EA234"><p>Prepare to start writing an EOP report and associated slides.</p>?????<br></p><p style="line-height:20.8px;background-color:transparent;"><br></p>?<br></p></div> |
<div class="ExternalClassF8A3A37CEA9346639AA50BB961BDB29E">Once CJI3 available complete EOP slides.</p><p><span style="font:400 13px/20.8px "segoe ui","segoe",tahoma,helvetica,arial,sans-serif;text-align:left;color:#444444;text-transform:none;text-indent:0px;letter-spacing:normal;text-decoration:none;word-spacing:0px;display:inline !important;white-space:normal;orphans:2;font-size-adjust:none;font-stretch:normal;float:none;background-color:transparent;">Request CJI3 report once all known costs have been applied.</span><span style="font:400 13px/20.8px "segoe ui","segoe",tahoma,helvetica,arial,sans-serif;text-align:left;color:#444444;text-transform:none;text-indent:0px;letter-spacing:normal;text-decoration:none;word-spacing:0px;display:inline !important;white-space:normal;orphans:2;font-size-adjust:none;font-stretch:normal;float:none;background-color:... |
<div class="ExternalClass53ABD8C10790435FA20EA6B3C0658E1B"><p></p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;">Once CJI3 available complete EOP slides.</p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;"><span style="text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style&#... |
<div class="ExternalClass18D86FBC3818425FB569E604DBA97FA9"><p></p><p style="line-height:20.8px;background-color:transparent;">Once CJI3 available complete EOP slides.</p><p style="line-height:20.8px;background-color:transparent;"><span style="line-height:20.8px;background-color:transparent;">Request CJI3 report once all known costs have been applied?</span></p><br></p></div> |
<div class="ExternalClass088C5894BE554D8BBFEC053C5FE81D55"><p></p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;">Once CJI3 available complete EOP slides.</p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;"><span style="line-height:20.8px;background-color:transparent;">Request CJI3 report once all known costs have been applied??</span></p><br> </p></div> |
<div class="ExternalClass9E6533746EAB4AF2BF3FB94E6A65AFED"><p></p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;">Once CJI3 available complete EOP slides.</p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;"><span style="line-height:20.8px;background-color:transparent;">Request CJI3 report once all known costs have been applied???</span></p><br> </p></div> |
<div class="ExternalClassCA561E6DB1F5409D857A1841B962611B">CR3 documented and submitted for approval. This CR requests that the implementation of CA client is restarted following the same deployment method as for LATCo.</p><p>CR2 will need to be revised as the licensing costs for CA will change due to a reduced number of client licences being required (6 as opposed to 20).</p><p>Sue not worked in PCC this week due to other commitments.?</p><p>First PCC/Lincs Agresso Forum meeting held on 05/03/2019, at which Agresso experiences were shared between PCC and LIns. <br></p><p><br> </p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-weight:400;text-decoration:none;word-spacing:0px;white-space:normal;orphans:2;background-color:transparent;"><strong style="margin:0px;line-height:20.8px;">... |
<div class="ExternalClassDBE9541976CE4649B188607B0788AE75"><p> <span style="font:400 13px/20.8px "segoe ui", segoe, tahoma, helvetica, arial, sans-serif;text-align:left;color:#444444;text-transform:none;text-indent:0px;letter-spacing:normal;text-decoration:none;word-spacing:0px;display:inline;white-space:normal;orphans:2;float:none;background-color:transparent;">Agresso Pipeline 2019-Q2 PID submitted and approved. </span> </p><p>Awaiting approval of CR3.</p><p>Awaiting license costs for CR2, unable to revise CR until license costs provided.</p><p>Sue has had problems logging onto PCC network remotely. She has continued to undertake contract related pipeline work for Richard McCarthy.<br></p><p><br> </p><p style="margin:0px 0px 10px;text-align:left;color:#444444;text-transform:none;line-height:20.8px;text-indent:0px;letter-spacing:normal;font-size:13px;font-style:normal;font-variant:normal;font-we... |
<div class="ExternalClass7B68D75BF86F4A86837989FCAFCF3799"><p><span style="font-family:"segoe ui", segoe, tahoma, helvetica, arial, sans-serif;">Agresso Pipeline 2019-Q2 PID revised, submitted and approved. </span><br></p><p>CR3 approved.<br></p><p>Amended costs for CA licences added to CR2 and revised CR submitted to PMO.<br></p><p>Financial Year-End activities commence next week so the time spent on the Agresso Pipeline by the ICT Team will be limited.<br></p><p>Project Board meeting held on 20/03/2019. <br></p><p>Timesheet for Sue's time on spent on 28/02 approved.<br></p><p></p><p style="line-height:20.8px;background-color:transparent;"><strong style="margin:0px;line-height:20.8px;">AP99 - Combined Authority </strong><br style="margin:0px;line-height:20.8px;"></p><p style="line-height:20.8px;background-color:transparent;"><span style="margin:0px;line-height:20.8px;">Completed enough of the configuration of the ... |
<div class="ExternalClass217D5FF855074455A8242C61FC5C1EB8"><p><span style="font-family:"segoe ui", segoe, tahoma, helvetica, arial, sans-serif;">Agresso Pipeline 2019-Q2 PID</span> approved by PCC. <br></p><p>Agresso ICT Team undertook financial year-end activities which have restricted the amount of time available to spend on the project.<br></p><p>BA handover meeting held with Phil.<br></p><p>Amended CR2 presented and approved at Gate meeting on 29/03/2019.<br></p><p style="line-height:20.8px;background-color:transparent;"><strong style="margin:0px;line-height:20.8px;">AP99 - Combined Authority </strong> <br style="margin:0px;line-height:20.8px;">CR 2 approved at gate meeting.<br></p><p style="line-height:20.8px;background-color:transparent;">Initial testing of system taking place.<br></p><p style="line-height:20.8px;background-color:transparent;">Meeting arranged on 01/04/2019 to discuss testing approach and monitoring requir... |
Solved! Go to Solution.
@Anonymous
Pasted your sample HTML data in PQ using ENTER DATA table option, you can import from your HTML file as Web source.
So the data looks like this
The Added a custom column with following code
=Html.Table([Column1], {{"ExtractedText",":root"}})
Then Expanded the New Column, you get only the text
If you want to combine al the above lines into one CELL, add the following line:
=Text.Combine(#"Expanded Custom"[ExtractedText],"#(lf)")
________________________
Did I answer your question? Mark this post as a solution, this will help others!.
Click on the Thumbs-Up icon on the right if you like this reply 🙂
⭕ Subscribe and learn Power BI from these videos
⚪ Website ⚪ LinkedIn ⚪ PBI User Group
@Fowmy That's awesome, thanks for the advice.
I've also discovered the custom Visual HTML Content, which displays HTML Rich text in their natural/intended form. Also works really nicely.
@Anonymous
Pasted your sample HTML data in PQ using ENTER DATA table option, you can import from your HTML file as Web source.
So the data looks like this
The Added a custom column with following code
=Html.Table([Column1], {{"ExtractedText",":root"}})
Then Expanded the New Column, you get only the text
If you want to combine al the above lines into one CELL, add the following line:
=Text.Combine(#"Expanded Custom"[ExtractedText],"#(lf)")
________________________
Did I answer your question? Mark this post as a solution, this will help others!.
Click on the Thumbs-Up icon on the right if you like this reply 🙂
⭕ Subscribe and learn Power BI from these videos
⚪ Website ⚪ LinkedIn ⚪ PBI User Group
@Fowmy That's awesome, thanks for the advice.
I've also discovered the custom Visual HTML Content, which displays HTML Rich text in their natural/intended form. Also works really nicely.
Take a look at Chris's post on this. - Removing HTML tags.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingCovering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.