March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hey all,
I'm creating a report, which has a SharePoint list datasource. It has like ~5 columns, and it contains memorandums about meetings. There is a description column, which contains the memos of the meetings, like "this and this happened, these are the actions to be made etc". It has paragraphs, and it causes problems. Here's a little sample data below:
So, basically, I'd like to remove all the HTML elments (paragraphs, line breaks and all other elements). They are all between the "<" and ">" delimiters, and I only need the TEXTs. I found the Text.BetweenDelimiters/AfterDelimiter/BeforeDelimiter functions in M, but I didn't know how to use them in an iterative way.
Anyone has any idea how to deal with it?
Regards,
Andris
Solved! Go to Solution.
you can use recursion in PowerQuery
crate a blank query and paste this code into editor, then call this function on your HTML column
(txt as text) => [ fnRemoveFirstTag = (HTML as text)=> let OpeningTag = Text.PositionOf(HTML,"<"), ClosingTag = Text.PositionOf(HTML,">"), Output = if OpeningTag = -1 then HTML else Text.RemoveRange(HTML,OpeningTag,ClosingTag-OpeningTag+1) in Output, fnRemoveHTMLTags = (y as text)=> if fnRemoveFirstTag(y) = y then y else @fnRemoveHTMLTags(fnRemoveFirstTag(y)), Output = @fnRemoveHTMLTags(txt) ][Output]
EDIT - typo in syntax
There is an even simpler solution.
You can create a new function called fun_ReplaceTextBetweenDelimiters, and in it add this code 👇
let fun_ReplaceTextBetweenDelimiters = (Text as text, StartDelimiter as text, EndDelimiter as text, optional ReplaceDelimiters as nullable logical, optional NewText as nullable text, optional TrimResult as nullable logical, optional FixDoubleSpaces as nullable logical) as text => let // Add Default Parameters Default_ReplaceDelimiters = if ReplaceDelimiters is null then true else ReplaceDelimiters, Default_NewText = if NewText is null then "" else NewText, Default_TrimResult = if TrimResult is null then true else TrimResult, Default_FixDoubleSpaces = if FixDoubleSpaces is null then true else FixDoubleSpaces, //Do work TextBetweenDelimiters = Text.BetweenDelimiters(Text, StartDelimiter, EndDelimiter), TextToReplace = if Default_ReplaceDelimiters then Text.Combine({StartDelimiter,TextBetweenDelimiters,EndDelimiter}) else TextBetweenDelimiters, ReplacedText = Text.Replace(Text, TextToReplace, Default_NewText), //Clean Result TrimmedText = if Default_TrimResult then Text.Trim(ReplacedText) else ReplacedText, FixedSpaces = if Default_FixDoubleSpaces then Text.Replace(TrimmedText, " ", " ") else TrimmedText in FixedSpaces in fun_ReplaceTextBetweenDelimiters
Then, we can test it like this:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRyVtAw1FTSAbGUYnWilVzd3BU0jEAiQBZYJCIyCqhGRwEsCOQoxcYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [TestData = _t, TargetData = _t]), ChangeType = Table.TransformColumnTypes(Source,{{"TestData", type text}, {"TargetData", type text}}), RunFunction = Table.AddColumn(ChangeType, "NewText", each fun_ReplaceTextBetweenDelimiters([TestData], "(", ")", true), type text), TestResult = Table.AddColumn(RunFunction, "Test", each [TargetData]=[NewText], type logical) in TestResult
Input:
TestData | TargetData |
ABC (1) | ABC |
EFG (2) | EFG |
XYZ (1, 2) | XYZ |
Output:
TestData | TargetData | NewText | Test |
ABC (1) | ABC | ABC | TRUE |
EFG (2) | EFG | EFG | TRUE |
XYZ (1, 2) | XYZ | XYZ | TRUE |
New source->New blank query-> Highlight the fx (to convert it into function)
Function 1 to remove all HTML Tags
(txt as text) =>
[
fnRemoveFirstTag = (HTML as text)=>
let
TextBetween=Text.BetweenDelimiters(HTML,"<",">"),
Output=if Text.Length(TextBetween)>0 then
Text.Replace(HTML,
Text.Insert(
Text.Insert(TextBetween,0,"<"),
Text.Length(TextBetween)+1,">")
,"")
else
HTML
in
Output,
fnRemoveHTMLTags = (y as text)=>
if y<>null then
if fnRemoveFirstTag(y)<>y then
@fnRemoveHTMLTags(fnRemoveFirstTag(y))
else
y
else "",
Output =if txt <> null then @fnRemoveHTMLTags(txt) else ""
][Output]
Function 2 to remove encodings
let
Source = (txt as text) =>
[
fnRemovecodes = (y as text)=>
Text.Replace(Text.Replace(
Text.Replace(Text.Replace(Text.Replace(
Text.Replace(Text.Replace(
Text.Replace(Text.Replace(
Text.Replace(Text.Replace(
Text.Replace(
Text.Replace(y,"'","'"),
"<","<"),
">",">"),
""",""""),
"&","&"),
" "," "),
"©","©"),
"®","®"),
"¢","¢"),
"£","£"),
"¥","¥"),
"€","€"),"'","'"),
Output =if txt <> null then @fnRemovecodes(txt) else ""
][Output]
in
Source
To apply to the column of your table
- Go to the table, clock on the column
- click on Add column-> Invoke custom function
Give aname to new column. Select the function1 and for txt select the column that you need to convert from HTML to text. Then on the new column select function 2 and for txt select the new column from previous step.
Hi @andris_,
I will rather use something like Text. Start(Text.Range etc..)
You can make a try with this formula and if it does not work please provide example I can copy easily and make the change ( not printscreen).
Text.Range([Description],Text.PositionOf([Description],"=")+2,Text.PositionOf([Description],"<p>")-Text.PositionOf([Description],"=")-4
Hope it helps...
Ninter
Hi @Interkoubess,
Thank you for your reply!
I think Text.Start can be a solution as well, but my focus is on the iteration part, because as you can see in my sample dataset, there are more than one delimiters in a record. What I'd like to do is to remove all the html elements, and keep/concatenate the TEXT parts.
With the following formula, you can have the same result, as yours:
Text.BeforeDelimiter([Description],"<p>")
Tell me if I misunderstood You, but I can't see how can I do it in an iterative way.
(Here's some sample data you can use)
ID | Topic | Customer | Description | Status |
1 | Implementation of XY | Customer 1 | <div class="ExternalClass6F25214767CC48ADBB6B525F221F78B4"><p>TEXT </p><p> </p><p>TEXT</p><p>TEXT</p>TEXT<strong>TEXT</strong></li></ul></div> | Opened |
2 | Product Managament | Customer 2 | <div class="ExternalClass45BB2C5321C95J49B0F78879862B8310C"><p>TEXT</p><p>TEXT</p><strong>TEXT</strong></p></div> | Closed |
3 | Exchange Machines | Customer 1 | <div class="ExternalClassEC3F6FC6C1D64D78980B1B5FF615B9860"><p>TEXT</p></div> | Opened |
4 | No.: 104085; checking status | Customer 3 | <div class="ExternalClassDC1495BEBABE408B9855E7DDF1B1D6AD"><p>TEXT</p><p>TEXT | Opened |
5 | Backup machines | Customer 4 | <div class="ExternalClass7EB7A021D123446BCA2D83011F9CC301C"><p>TEXT</p><p><br></p><p>TEXT</p><p>TEXT</p></div> | Closed |
Regards,
Andris
Hey @Interkoubess,
This formula (with minor changes) is working, but it will only give you the first text. The problem is, that there are more between in a record, and every record is different (so, like in my sample data, the first record has 5 TEXT parts, the second one 3 TEXT parts etc).
What do you think the general solution should be?
Regards,
Andris
Hi @andris_,
Have you solved your problem?
If you have soloved, please always accept the replies making sense as solution to your question so that people who may have the same question can get the solution directly.
If you still need help, you could have a reference of this video.
Best Regards,
Cherry
you can use recursion in PowerQuery
crate a blank query and paste this code into editor, then call this function on your HTML column
(txt as text) => [ fnRemoveFirstTag = (HTML as text)=> let OpeningTag = Text.PositionOf(HTML,"<"), ClosingTag = Text.PositionOf(HTML,">"), Output = if OpeningTag = -1 then HTML else Text.RemoveRange(HTML,OpeningTag,ClosingTag-OpeningTag+1) in Output, fnRemoveHTMLTags = (y as text)=> if fnRemoveFirstTag(y) = y then y else @fnRemoveHTMLTags(fnRemoveFirstTag(y)), Output = @fnRemoveHTMLTags(txt) ][Output]
EDIT - typo in syntax
Hi @Stachu , I'm new to using power query to extract data from sharepoint.
I followed your code to remove all the HTML but there is this error msg. Appreciate your help on this.
hi @Gill
it seems you've put the code into the formula bar rather than an advanced editor:
https://learn.microsoft.com/en-us/power-query/power-query-ui#the-advanced-editor
@Stachu ,
I'm new to calling functions on columns so please bear with me.
The statement " then call this function on your HTML column" confuses me a little bit. The query solution you provided has been created in my project and is called "RemoveHTML".
The column with the HTML is called "Comments" and the table it is located in is called "Issues".
Do I create a function from the Query first? If so, the system wants me to create a parameter as it says the Query doesn't include a parameter.
Not sure what my next steps are, any assistance is appreciated.
Patrick
@pAAmckelvey in the Issues table you need to create a new column with this syntax:
RemoveHTML([Comments])
I am just getting into functions myself and have been working on a problem to remove HTML tags from within a comments column.
I tried using the Function as you outlined, but I keep getting an error that I am passing a parameter to a functions that expects 0 parameters.
(1) In my steps, I created a blank query and copied your code into it.
(2) I then created a function from the blank query.
(3) Created a Custom Column and added the syntax you outlined in your response. (Probably where I am going wrong.)
(4) It then returns the error I mention above.
Any thoughts on what I might be doing wrong?
Thanks
Joe Mays
@Anonymous can you create a new post describing specific problem that you have and share the code for all the queries that you use?
@Stachu No problem, I will create a new post and outlined the specific issue I am having. Thanks for your response.
Hey @Stachu,
Sorry for not replying until now, but I wasn't able to check it til today. But your solution is awesome, it's working! Thank you! 🙂
Best regards,
Andris
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
125 | |
85 | |
69 | |
54 | |
45 |
User | Count |
---|---|
204 | |
106 | |
98 | |
65 | |
54 |