Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
Fair-UL
Helper II
Helper II

Convert data from HTML into plain text in a table- Data from SQL table

I am bringing field from a SQL table- Azure that is saved in rich text format. When thos field is brought into a table visual, it is not presented in simple text format. I tried several workaround I searched, but nothing worked. Any solution to this? 

e.g. <p>next milestone due Nov 12</p>

 

I want it to be shown:

next milestone due Nov 12

10 REPLIES 10
Anonymous
Not applicable

Hi @Fair-UL ,

I think to replace tags not a simple way to handle this scenario, it means you need to configure a dictionary table with all HTML tags and use it in replace functions. (in addition, it also hard to find out specific text from replaced text value)

In my opinion, I'd like to you can try to add a custom column with HTML connector(Html.Table) to receive these string that contains HTML elements and extract specific characters from the result table.

Removing HTML Tags From Text In Power Query/Power BI 
Regards,

XIaoxni Sheng

Thank you for your response. I have seen this solution but I had difficulty implementing it.

let source= "[detail_text]",
removetags= Html.Table(source, {{"text", ":root"}}),
GetText= removetags [text] {0}
in
GetText

 

this is what I used in custom column, but its returning the text [detail_text] in all fields. 

detail_field is the column header of the column that contains html text coming from the database

the address of server is (xxxhub-prod.database.windows.net), and the database is xxx

 

not sure how to implement this code you advised to use

 

let
    Source =
        "<!DOCTYPE html>
        <html>
        <body>
        <h1>A Heading</h1>
        <p>A paragraph</p>
        </body>
        </html>",
    RemoveTags = Html.Table(Source, {{"text",":root"}}),
    GetText = RemoveTags[text]{0}
in
    GetText
Anonymous
Not applicable

@Fair-UL,

You can replace <p> and </p> with blank space in Query Editor using Replace Values transformation. 

 

I started with this approach, but there were too many of them. the html library is huge to be cleaned using replace function.

there is so many <li>, </li>, <p>, </p>, </ol>, etc

but thanks for the idea. might help other readers. 

Anonymous
Not applicable

Hi @Fair-UL ,

In fact, it requires you to typing the 'column selector' parameter in HTML.table to extract specific element values. 

I create a table with 'index' and 'html' strings,  I add a custom column store 'h1' element that extracts from 'body' element, below is the query table formula and snapshot of the result:

let
    Source = #table({"Index","HTML"},
    {
        {1,"<!DOCTYPE html><html><body><h1>A Heading</h1><p>A paragraph</p></body></html>"},
        {2,"<!DOCTYPE html><html><body><h1>B Heading</h1><p>B paragraph</p></body></html>"},
        {3,"<!DOCTYPE html><html><body><h1>C Heading</h1><p>C paragraph</p></body></html>"}
    }),
    #"Added Custom" = Table.AddColumn(Source, "H1", each Html.Table([HTML],{{"body","h1"}})[body]{0})
in
    #"Added Custom"

17.png

Reference links:

Web Scraping with Html.Table in Power Query 

New Web Scraping experience in Power BI / Power Query (Using CSS Selectors) 

Expressions, values, and let expression 

If you still confused about extract specific elements, you can also share some sample data and expected result for test. (do mask on sensitive data before share)

Regards,

Xiaoxin Sheng

Hi,

 

Thank you for your illustration, however, it is still confusing to me.

The field that I have is a free text field and users input string in and I am pulling this field into a table in power BI.

The way it looks in power BI depends on any spaces, enter, numbering the users have used while inputting their text

can include any of the html tags in between text

 

image.png

Anonymous
Not applicable

HI @Fair-UL ,

If your text contains randomly HTML tag, maybe you can try to use the following custom function to remove all characters that package with '<>'.

Recursive custom function:

let
    RemoveTag = (input as text) as text =>
    let    
        length = Text.Length(input),
        position = Text.PositionOf(input, "<"),
        positionEnd = Text.PositionOf(input, ">"),
        range = positionEnd-position+1,
        replaced = if position >= 0 then Text.ReplaceRange(input, position, range, "") else input,
        result= if Text.Contains(replaced,"<") then RemoveTag(replaced) else replaced
    in
        result
in
    RemoveTag

Usage:

let
    Source = RemoveTag("<!DOCTYPE html><html><body><h1>A Heading</h1><p>A paragraph</p></body></html>")
in
    Source

Reference link:

Robust function to remove HTML tags 

If above not help, can you please share some sample data as table format for test? 

Regards,

Xiaoxin Sheng

test

Hi Xiaoxin,

Can you please provide some help with this sample data. The data has random html tags that not only include <xxx> but also tags like &nbsp 

 

Column_Text
<p>Updated Quick Task Model</p>
<p>need to update some missing trainings&nbsp;</p>
<p>initial response due Nov 11</p>
<p>in process</p>
<p>video in process, plan to launch Nov 14</p>
Anonymous
Not applicable

HI @Fair-UL ,

Maybe you can refer to v-yuezhe-msft 's suggestion to do these conversions on SQL side then load the converted data to power bi:

I have a column in DQ that is stored in HTML format 
Regards,

Xiaoxin Sheng

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.