The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
I'm looking for help to remove html tags from a string.
Example input: <div class="ExternalClass742C332E0D0340C598BC9A78413A04DE">Staff going to storage training</div>
Desired output: Staff going to storage training
I found @MarcelBeug 's post in Robust-function-to-remove-HTML-tags very helpful, but can't seem to nail the magic combo of also including other tags in the <> brackets.... like class=somethingIdon'tcaretosee
Solved! Go to Solution.
You could try using PowerQuery Text Between function and use '>' and '</' as the delimiters.
You can also do this in DAX using PATHITEM and SUBSTITUTE.
InnerHTML =
PATHITEM( // Splits the string using delimiter "|", and takes the 2nd item that is a type of Text
SUBSTITUTE( // Output <div class="Whatever"|Staff going to storage training|div>
SUBSTITUTE([Html], ">", "|"), // Output <div class="Whatever"|Staff going to storage training</div>
"</","|"
),
2,
TEXT
)
I know I am a little late to the game but after trying extract between '>' and '</' and having it not work for me I broke it down into 2 steps.
1. Extract before '</'
2. Extract after '<' (using the Scan for the delimiter from the end of the input' in advanced options)
This worked for me as my source text has an inconsistant number of HTML Tags but always more than 3. This meant Extract between was giving weird results.
#"Extracted Text Before Delimiter" = Table.TransformColumns(#"Changed Type", {{"Subject Request", each Text.BeforeDelimiter(_, "</"), type text}}),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Extracted Text Before Delimiter", {{"Subject Request", each Text.AfterDelimiter(_, ">", {0, RelativePosition.FromEnd}), type text}})
Removing HTML tasks can be a daunting task, I had to do this as well and it always needed some maintenance as new cases were coming in. As an alternative, why don't you just use a custom visual that can actually deal with HTML?
Take a look at the HTML Text Styler visual which you can get from the AppSource: https://appsource.microsoft.com/en-us/product/power-bi-visuals/wa200002071?tab=overview
Hi @ianbruckner ,
We can create a calculated column using DAX as well.
Column = VAR len = SEARCH ( ">", Table1[Column1],, BLANK () ) VAR len2 = SEARCH ( "</div>", Table1[Column1],, BLANK () ) RETURN MID ( Table1[Column1], len + 1, len2 - len - 1 )
Regards,
Frank
Hello. I tried your query it worked partially and i still have below html tags in the content which needs to be removed.
<div style="font-family:Calibri, Arial, Helvetica, sans-serif;font-size:11pt;color:rgb(0, 0, 0);">communication started with recruitment team section head in order to assign contact person,<br>i asked the BRM to follow up on that as <span style="font-family:Calibri, Arial, Helvetica, sans-serif;background-color:rgb(255, 255, 255);display:inline !important;">recruitment team section head didn't reply to the email yet</span>
The highlighted in green should be the outcome. Kindly help,
You could try using PowerQuery Text Between function and use '>' and '</' as the delimiters.
You can also do this in DAX using PATHITEM and SUBSTITUTE.
InnerHTML =
PATHITEM( // Splits the string using delimiter "|", and takes the 2nd item that is a type of Text
SUBSTITUTE( // Output <div class="Whatever"|Staff going to storage training|div>
SUBSTITUTE([Html], ">", "|"), // Output <div class="Whatever"|Staff going to storage training</div>
"</","|"
),
2,
TEXT
)
I ended up using Text.BetweenDelimiters, and it certainly stripped out the first tag. Then I found enough other tags hidden inside the input that 1, makes it impractical to state them all, and then 2, scared enough I'd lose data if I went to the lowest common denominator of > < as the delimiters alone... so I gave up. Instead, I'll probably use the HTML visualization that's filtered by a user selecting the row that contains the rest of the data. I really wish I could embed that in the table - oh well for now.
Thanks for the pointers!
Can you give a sample input and output where multiple tags are present? Maybe use 3 layers of nested tags if you have that example?
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
121 | |
86 | |
77 | |
55 | |
48 |
User | Count |
---|---|
136 | |
128 | |
78 | |
64 | |
63 |