Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hi folks,
I'm hoping someone might be able to help. I have a bunch of SharePoint intranet pages, and I'm attempting to pull out all the URLs from the text, buttons, etc. into a table. The purpose will be to create a report that shows which pages have links in them, and where they point to.
I have the HTML code for each page in a report, now I need to create some rules in a DAX measure to pull out the URLs from each page. The challenge is that different page widgets use different HTML code consistencies, so there will need to be a number of different rules.
So the rules need to extract all of the text between the start and ends points below, and store them in a table. The rules would
| Rule | Capture string where the beginning of the text starts with | Stop capturing string when the first instance of the text below occurs: |
| 1 | href="" | "" |
| 2 | "http | " |
| 3 | href=""http | " |
Here is some sample code:
https://1drv.ms/u/s!AvfpkO5b74akg6li3CyRiH-e-KE8Zg?e=GRcW4A
This sample code is in a column in a table called 'Authoring Canvas Content'. There is also a column called 'Name' that contains the page URL. Here's a sample:
| 'Authoring Canvas Content' | Name |
| <code above> | https://intranet/sites/mysite/mypage.aspx |
In terms of outputs, here's what I need to see:
Hopefully that makes sense - any help would be really appreciated 🙂
Solved! Go to Solution.
Here is one version of an implementation
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter",""",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
#"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
#"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns",":",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
#"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
#"Removed Duplicates"
Here is one version of an implementation
let
Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\xxx\Downloads\HTML_code_sample.txt"), null, null, 1252)}),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Promoted Headers", {{"Authoring Canvas Content", Splitter.SplitTextByDelimiter("https", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Authoring Canvas Content"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter",""",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value1","<",">",Replacer.ReplaceText,{"Authoring Canvas Content"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value4", "Authoring Canvas Content", Splitter.SplitTextByEachDelimiter({">"}, QuoteStyle.Csv, false), {"Authoring Canvas Content.1", "Authoring Canvas Content.2"}),
#"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"Authoring Canvas Content.1"}),
#"Replaced Value2" = Table.ReplaceValue(#"Removed Other Columns",":",":",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","://","https://",Replacer.ReplaceText,{"Authoring Canvas Content.1"}),
#"Removed Top Rows" = Table.Skip(#"Replaced Value3",1),
#"Removed Duplicates" = Table.Distinct(#"Removed Top Rows")
in
#"Removed Duplicates"
Apologies for the delay with my reply @lbendlin - this is great, thank you so much for your time 🙂
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 57 | |
| 48 | |
| 35 | |
| 34 | |
| 21 |
| User | Count |
|---|---|
| 143 | |
| 122 | |
| 100 | |
| 80 | |
| 57 |