Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.

Reply

Extract URLS from HTML code into table

Hi folks,

I'm hoping someone might be able to help. I have a bunch of SharePoint intranet pages, and I'm attempting to pull out all the URLs from the text, buttons, etc. into a table. The purpose will be to create a report that shows which pages have links in them, and where they point to.

 

I have the HTML code for each page in a report, now I need to create some rules in a DAX measure to pull out the URLs from each page. The challenge is that different page widgets use different HTML code consistencies, so there will need to be a number of different rules.

 

So the rules need to extract all of the text between the start and ends points below, and store them in a table. The rules would

 

RuleCapture string where the beginning of the text starts withStop capturing string when the first instance of the text below occurs:
1href=""""
2"http"
3href=""http&quot

 

Here is some sample code:

https://1drv.ms/u/s!AvfpkO5b74akg6li3CyRiH-e-KE8Zg?e=GRcW4A 

 

This sample code is in a column in a table called 'Authoring Canvas Content'. There is also a column called 'Name' that contains the page URL. Here's a sample:

 

'Authoring Canvas Content'Name
<code above>https://intranet/sites/mysite/mypage.aspx

 

In terms of outputs, here's what I need to see:

PageURL
https://intranet/sites/mysite/mypage.aspxhttps://www.google1.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google1.5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google1.6.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google2.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google3.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google4.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google4.5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google4.6.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google6.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google6.5.com
https://intranet/sites/mysite/mypage.aspxhttps://www.google6.6.com

 

Hopefully that makes sense - any help would be really appreciated 🙂

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

Here is one version of an implementation

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","&quot;",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns","&#58;",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
    #"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
    #"Removed Duplicates"
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

lbendlin_0-1644554018952.png

 

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

Here is one version of an implementation

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","&quot;",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns","&#58;",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
    #"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
    #"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
    #"Removed Duplicates"
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

lbendlin_0-1644554018952.png

 

Apologies for the delay with my reply @lbendlin - this is great, thank you so much for your time 🙂

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors