Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Extracting names from html in a column with variable structure

Hi there,

 

I have connected to a SharePoint List which imports data from a Microsoft Form. When the data is imported it replaces some of the text with HTML code:

For Example

Appointed Candidate(s)
<div class="ExternalClassF09666AB6C1F4582804BC8F27B5D8669"><div style="font-family&#58;Calibri, Arial, Helvetica, sans-serif;font-size&#58;11pt;color&#58;rgb(0, 0, 0);"><span style="color&#58;black;">​Jane Smith</div>

I want to wrangle the column so that only the name within the HTML code would be present:

Appointed Candidate(s)
Jane Smith

 

I originally tried to combat this by creating a Custom Column from Examples and entering the name in the column to extract the text from between a set of delimiters, however, the HTML code is variable as some cells contain 2 or 3 peoples names and the data is inconsistent as it has been entered in free format on the Microsoft Form.

For example:

Appointed Candidate(s)

"<div class="ExternalClassF09666AB6C1F4582804BC8F27B5D8669"><div style="font-family&#58;Calibri, Arial, Helvetica, sans-serif;font-size&#58;11pt;color&#58;rgb(0, 0, 0);"><span style="color&#58;black;">​Jane Smith</span><br></div></div>"

"<div class="ExternalClass13330A41C3914588B1527918A7C4DA70"><div><span style="color&#58;black;"></span></div></div><div style="color&#58;rgb(29, 29, 29);background-color&#58;rgb(255, 255, 255);"><div><span style="color&#58;black;">00</span>1&#58;&#160;<span style="background-color&#58;rgb(255, 255, 255);display&#58;inline !important;">John Smith</span></div></div><div style="color&#58;rgb(29, 29, 29);background-color&#58;rgb(255, 255, 255);">002&#58;&#160;<span style="background-color&#58;rgb(255, 255, 255);display&#58;inline !important;">Jack Jill</span></div><div style="color&#58;rgb(29, 29, 29);background-color&#58;rgb(255, 255, 255);">003&#58;&#160;Peter Rabbit</div><div>"

 

This converts the DAX formula to one which is an if statement and replaces the HTML code with the desired name - which I don't think would work when data is refreshed and new entries are added as each if statement will be entirely unique.

 

Is there a way to tackle this and to just extract the names from the HTML code in the format I described?

 

Thanks!

1 ACCEPTED SOLUTION
Vijay_A_Verma
Most Valuable Professional
Most Valuable Professional

Use this in PQ. See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZNBT8IwFMe/SsWLJCO2G+u2LB62ITGcjB6BQzcKNJZuaQtRT979ln4SO4cMYRglMVxM1mVt3/vv/359HQ5boyWETjZhK5BxotTVqHX9qKkUhCflvA8DjHEU4wT1u65v+7AbJ37f9mK352McjCoBWsso/cSpkZnmQnemZMH4U7lp43PXDxPCWSqZBSLJCLfADeUrqllGLKCIUB1FJZuGH6mKPdM6EaFCh1nOc1mvyVl6AS1QPu3wqxFVELFxspOWcpI9bOLfXl4HRFBwv2B6XiVfmjKqzdbY+ifURKjM3lZL5fasBngMUOQ4Doy6KHECZID6MXJtL0B+5CXdXuTBPaDHlXWgksPe9w+wgbYdWKAa7TA1v5vJfCkmnaZI1zVh69fu6RxdFYR7daE6dv2FMAwbhH/ld8JUwclW4zLBmemSM7YocqmJ0BtPg3wuvu2eUzM3ISdnZHTBgHH+Q0R/j8RpRHJLzU0FdyRNmT7o7POuj98B", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Appointed Candidate(s)" = _t]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Appointed Candidate(s)", Splitter.SplitTextByDelimiter("</", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Appointed Candidate(s)"),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Split Column by Delimiter", "Text After Delimiter", each Text.AfterDelimiter([#"Appointed Candidate(s)"], ">", {0, RelativePosition.FromEnd}), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "Extracted Name", each Text.Select([Text After Delimiter],{"a".."z","A".."Z"," "})),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Text After Delimiter"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each ([Extracted Name] <> ""))
in
    #"Filtered Rows1"

 

View solution in original post

3 REPLIES 3
Vijay_A_Verma
Most Valuable Professional
Most Valuable Professional

Use this in PQ. See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZNBT8IwFMe/SsWLJCO2G+u2LB62ITGcjB6BQzcKNJZuaQtRT979ln4SO4cMYRglMVxM1mVt3/vv/359HQ5boyWETjZhK5BxotTVqHX9qKkUhCflvA8DjHEU4wT1u65v+7AbJ37f9mK352McjCoBWsso/cSpkZnmQnemZMH4U7lp43PXDxPCWSqZBSLJCLfADeUrqllGLKCIUB1FJZuGH6mKPdM6EaFCh1nOc1mvyVl6AS1QPu3wqxFVELFxspOWcpI9bOLfXl4HRFBwv2B6XiVfmjKqzdbY+ifURKjM3lZL5fasBngMUOQ4Doy6KHECZID6MXJtL0B+5CXdXuTBPaDHlXWgksPe9w+wgbYdWKAa7TA1v5vJfCkmnaZI1zVh69fu6RxdFYR7daE6dv2FMAwbhH/ld8JUwclW4zLBmemSM7YocqmJ0BtPg3wuvu2eUzM3ISdnZHTBgHH+Q0R/j8RpRHJLzU0FdyRNmT7o7POuj98B", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Appointed Candidate(s)" = _t]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Appointed Candidate(s)", Splitter.SplitTextByDelimiter("</", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Appointed Candidate(s)"),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Split Column by Delimiter", "Text After Delimiter", each Text.AfterDelimiter([#"Appointed Candidate(s)"], ">", {0, RelativePosition.FromEnd}), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "Extracted Name", each Text.Select([Text After Delimiter],{"a".."z","A".."Z"," "})),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Text After Delimiter"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each ([Extracted Name] <> ""))
in
    #"Filtered Rows1"

 

Anonymous
Not applicable

Thanks so much for this! I will try it out. Would you be able to clarify what you mean by UI? Sorry I am quite new to DAX!

Vijay_A_Verma
Most Valuable Professional
Most Valuable Professional

UI means PQ user interface. But since you don't need Changed Type step, then you can ignore this.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors