Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

July 7 - July 17 | Round 2 of the Power BI Dataviz World Championships. Don't miss your chance! Learn more

Reply

Extract URLS from HTML code into table

Hi folks,

I'm hoping someone might be able to help. I have a bunch of SharePoint intranet pages, and I'm attempting to pull out all the URLs from the text, buttons, etc. into a table. The purpose will be to create a report that shows which pages have links in them, and where they point to.

 

I have the HTML code for each page in a report, now I need to create some rules in a DAX measure to pull out the URLs from each page. The challenge is that different page widgets use different HTML code consistencies, so there will need to be a number of different rules.

 

So the rules need to extract all of the text between the start and ends points below, and store them in a table. The rules would

 

RuleCapture string where the beginning of the text starts withStop capturing string when the first instance of the text below occurs:
1href=""""
2"http"
3href=""http&quot

 

Here is some sample code:

https://1drv.ms/u/s!AvfpkO5b74akg6li3CyRiH-e-KE8Zg?e=GRcW4A 

 

This sample code is in a column in a table called 'Authoring Canvas Content'. There is also a column called 'Name' that contains the page URL. Here's a sample:

 

'Authoring Canvas Content'Name
<code above>https://intranet/sites/mysite/mypage.aspx

 

In terms of outputs, here's what I need to see:

PageURL
https://intranet/sites/mysite/mypage.aspxhttps://www.google1.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google1.5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google1.6.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google2.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google3.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google4.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google4.5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google4.6.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google6.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google6.5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google6.6.com

 

Hopefully that makes sense - any help would be really appreciated 🙂

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

Here is one version of an implementation

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","&quot;",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns","&#58;",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
    #"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
    #"Removed Duplicates"
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

lbendlin_0-1644554018952.png

 

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

Here is one version of an implementation

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","&quot;",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns","&#58;",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
    #"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
    #"Removed Duplicates"
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

lbendlin_0-1644554018952.png

 

Apologies for the delay with my reply @lbendlin - this is great, thank you so much for your time 🙂

Helpful resources

Announcements
FabCon and SQLCon Barcelona 2026

FabCon & SQLCon – Barcelona 2026

Join us in Barcelona for FabCon and SQLCon, the Fabric, Power BI, SQL, and AI community event. Save €200 with code FABCMTY200.

60 days of Data Days Carousel

Data Days 2026

Join Fabric Data Days 2026: 60 days of free live/on-demand sessions, challenges, study groups, and certification opportunities.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.