Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello,
I need help how to HTML Extrat Multiple links in multiple rows in a Web Page.
My example is:
let
Source = Web.BrowserContents("https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8782061"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "TABLE.table.table-striped:nth-child(1) > * > TR > :nth-child(1)"}, {"Column2", "TABLE.table.table-striped:nth-child(1) > * > TR > :nth-child(2)"}}, [RowSelector="TABLE.table.table-striped:nth-child(1) > * > TR"]),
#"Replaced Value" = Table.ReplaceValue(#"Extracted Table From Html","),","),|#(cr)",Replacer.ReplaceText,{"Column2"}),
#"Duplicated Column" = Table.DuplicateColumn(#"Replaced Value", "Column2", "Column2 - Copy"),
#"Removed Columns" = Table.RemoveColumns(#"Duplicated Column",{"Column2"}),
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Column2 - Copy"})
in
#"Filled Down"
I can get for the rows that have multiple links the rows, but How can I split the links in rows, for that row?
Example:
Column1 with "Entidades concorrentes" as 3 links
"BP Portugal – Comércio de Combustíveis e Lubrificantes, S.A. (500194670),|
Petróleos de Portugal - PETROGAL, S.A. (500697370),|
Repsol Portuguesa, S.A. (500246963)"
My question is how can I split that row "Column1 with "Entidades concorrentes" and the others with the same context (1->n, One row, multiple sub rows) and get the link's?
I Appreciate you help.
Thakns,
Manuel Mesquita
Solved! Go to Solution.
Hi, @manuelmesquita
I don't know what the rules are in the links because I don't understand the language.
I searched for some ways, hope it helps:
Use Power BI Web From Example over multiple URLs
Extracting Data from Complex Web Pages with Power BI - Excelerator BI
use power Query to extract hyperlinks from a web table (microsoft.com)
Best Regards,
Community Support Team _Janey
If this post helps, then please consider Accept it as the solution to help the other members find it more.
I have sent you the doc for getting the URLs, if you know the rule between the URLs you need.
As for why this method is useless, I don't know if you have tried changing it to http
If http is useless, it should be that the url is encrypted, you can see the base4...id or something behind it, I don't know html, I can only see this...binary code? Splitting the url after encryption doesn't seem to work. So it should be the security of the website, not your code.
Best Regards,
Community Support Team _Janey
If this post helps, then please consider Accept it as the solution to help the other members find it more.
Olá Janey,
I've reached to split de columns in rows from the example with same transformations. The result is what I want (Image-1). Except that I yet cannot get the urls for the line, that I need to parse to new rows/coluns to get same adicional information (The Yellow boxes). the items in black box is only text.
For Example I need to extrat for same cell's "Column2" the url link for same rows as example:
Each row "Entidades concorrentes", there is a link, I need to get the link's/url's:
"<td data-title="Entidades concorrentes"><a href="/Base4/pt/detalhe/?type=entidades&id=76208">Arquijardim, S.A. (504818384)</a>,<br><a href="/Base4/pt/detalhe/?type=entidades&id=8119">Lusosicó- Construções, S.A. (504870475)</a>,<br><a href="/Base4/pt/detalhe/?type=entidades&id=67630">Penela Terraplanagens - Desaterros e Terraplanagens, Ldª (503954233)</a>,<br><a href="/Base4/pt/detalhe/?type=entidades&id=100117">Tecnorem-Engenharia e Construções, SA (502519533)</a></td>"
Ref: Page: https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8852980
My Atual Code:
-----------------
let
Source = Web.BrowserContents("https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8852980"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".b-first"}, {"Column2", ".table:nth-child(1) > TBODY:nth-child(1):nth-last-child(1) > TR > :nth-child(2)"}}, [RowSelector=".b-first"]),
#"Replaced Value3" = Table.ReplaceValue(#"Extracted Table From Html","),",")),",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value3", {{"Column2", Splitter.SplitTextByDelimiter("),", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","Portugal",",|Portugal,",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",",,",",",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",",|P","|P",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value2", {{"Column2", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2")
in
#"Split Column by Delimiter1"
-----------------
If possible, I appreciate how to get the links. for each colun that have link.
Best Regards and Thanks.
Manuel mesquita
Hi, @manuelmesquita
I don't know what the rules are in the links because I don't understand the language.
I searched for some ways, hope it helps:
Use Power BI Web From Example over multiple URLs
Extracting Data from Complex Web Pages with Power BI - Excelerator BI
use power Query to extract hyperlinks from a web table (microsoft.com)
Best Regards,
Community Support Team _Janey
If this post helps, then please consider Accept it as the solution to help the other members find it more.
Hi, @manuelmesquita
I don't understand your needs very well, can you explain more?
Is this what you want? ?
Janey
Yes, Is that that I want Janey, but the problem is to get the description and the URL for each column. after transpose to rows.
I only problem is for each group, example "Entidades concorrentes" the entity if different but it gives the same link. "id=76208", but the "id=" and html link’s are diferent's, the question how to get the correct html link for the correct entity.
I've reach the separation with the help of the post of "Chris Webb's BI Blog", "https://blog.crossjoin.co.uk/2018/08/30/power-bi-extract-urls-web-page/"
. I've use this code except that It comes with the same link for different entities.
This what I need.
Can you help?
Best Regards.
--------
let
Source = Web.BrowserContents("https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8852980"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".b-first"}, {"Column2", ".table:nth-child(1) > TBODY:nth-child(1):nth-last-child(1) > TR > :nth-child(2)"},
{ "Link",
"a[href^=""/Base4/pt/detalhe/""]",
each [Attributes][href]}
}, [RowSelector=".b-first"]),
#"Replaced Value3" = Table.ReplaceValue(#"Extracted Table From Html","),",")),",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value3", {{"Column2", Splitter.SplitTextByDelimiter("),", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","Portugal",",|Portugal,",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",",,",",",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",",|P","|P",Replacer.ReplaceText,{"Column2"})
in
#"Replaced Value2"
------------------------------
I have sent you the doc for getting the URLs, if you know the rule between the URLs you need.
As for why this method is useless, I don't know if you have tried changing it to http
If http is useless, it should be that the url is encrypted, you can see the base4...id or something behind it, I don't know html, I can only see this...binary code? Splitting the url after encryption doesn't seem to work. So it should be the security of the website, not your code.
Best Regards,
Community Support Team _Janey
If this post helps, then please consider Accept it as the solution to help the other members find it more.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
72 | |
71 | |
57 | |
38 | |
36 |
User | Count |
---|---|
81 | |
67 | |
62 | |
46 | |
45 |