Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredJoin us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.
Hi folks,
I'm hoping someone might be able to help. I have a bunch of SharePoint intranet pages, and I'm attempting to pull out all the URLs from the text, buttons, etc. into a table. The purpose will be to create a report that shows which pages have links in them, and where they point to.
I have the HTML code for each page in a report, now I need to create some rules in a DAX measure to pull out the URLs from each page. The challenge is that different page widgets use different HTML code consistencies, so there will need to be a number of different rules.
So the rules need to extract all of the text between the start and ends points below, and store them in a table. The rules would
| Rule | Capture string where the beginning of the text starts with | Stop capturing string when the first instance of the text below occurs: |
| 1 | href="" | "" |
| 2 | "http | " |
| 3 | href=""http | " |
Here is some sample code:
https://1drv.ms/u/s!AvfpkO5b74akg6li3CyRiH-e-KE8Zg?e=GRcW4A
This sample code is in a column in a table called 'Authoring Canvas Content'. There is also a column called 'Name' that contains the page URL. Here's a sample:
| 'Authoring Canvas Content' | Name |
| <code above> | https://intranet/sites/mysite/mypage.aspx |
In terms of outputs, here's what I need to see:
Hopefully that makes sense - any help would be really appreciated 🙂
Solved! Go to Solution.
Here is one version of an implementation
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter",""",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
#"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
#"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns",":",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
#"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
#"Removed Duplicates"
Here is one version of an implementation
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter",""",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
#"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
#"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns",":",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
#"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
#"Removed Duplicates"
Apologies for the delay with my reply @lbendlin - this is great, thank you so much for your time 🙂
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
Check out the October 2025 Power BI update to learn about new features.
| User | Count |
|---|---|
| 79 | |
| 48 | |
| 35 | |
| 31 | |
| 27 |