Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
manuelmesquita
Frequent Visitor

I need help how to HTML Extrat Multiple links in multiple rows in a Web Page

Hello,

I need help how to HTML Extrat Multiple links in multiple rows in a Web Page.

My example is:
let
Source = Web.BrowserContents("https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8782061"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "TABLE.table.table-striped:nth-child(1) > * > TR > :nth-child(1)"}, {"Column2", "TABLE.table.table-striped:nth-child(1) > * > TR > :nth-child(2)"}}, [RowSelector="TABLE.table.table-striped:nth-child(1) > * > TR"]),
#"Replaced Value" = Table.ReplaceValue(#"Extracted Table From Html","),","),|#(cr)",Replacer.ReplaceText,{"Column2"}),
#"Duplicated Column" = Table.DuplicateColumn(#"Replaced Value", "Column2", "Column2 - Copy"),
#"Removed Columns" = Table.RemoveColumns(#"Duplicated Column",{"Column2"}),
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Column2 - Copy"})
in
#"Filled Down"

I can get for the rows that have multiple links the rows, but How can I split the links in rows, for that row?

Example:
Column1 with "Entidades concorrentes" as 3 links
"BP Portugal – Comércio de Combustíveis e Lubrificantes, S.A. (500194670),|
Petróleos de Portugal - PETROGAL, S.A. (500697370),|
Repsol Portuguesa, S.A. (500246963)"

 

My question is how can I split that row "Column1 with "Entidades concorrentes" and the others with the same context (1->n, One row, multiple sub rows) and get the link's?

I Appreciate you help.

Thakns,
Manuel Mesquita

2 ACCEPTED SOLUTIONS

Hi, @manuelmesquita 

 

I don't know what the rules are in the links because I don't understand the language.

I searched for some ways, hope it helps:

Use Power BI Web From Example over multiple URLs 

Extracting Data from Complex Web Pages with Power BI - Excelerator BI

use power Query to extract hyperlinks from a web table (microsoft.com)

 

Best Regards,

Community Support Team _Janey

If this post helps, then please consider Accept it as the solution to help the other members find it more.

View solution in original post

@manuelmesquita 

 

I have sent you the doc for getting the URLs, if you know the rule between the URLs you need.

 

As for why this method is useless, I don't know if you have tried changing it to http

 

vjaneygmsft_0-1644803703938.png

vjaneygmsft_2-1644803951330.png

 

If http is useless, it should be that the url is encrypted, you can see the base4...id or something behind it, I don't know html, I can only see this...binary code? Splitting the url after encryption doesn't seem to work.  So it should be the security of the website, not your code.

 

Best Regards,

Community Support Team _Janey

If this post helps, then please consider Accept it as the solution to help the other members find it more.

View solution in original post

5 REPLIES 5
manuelmesquita
Frequent Visitor

Olá Janey,

I've reached to split de columns in rows from the example with same transformations. The result is what I want (Image-1). Except that I yet cannot get the urls for the line, that I need to parse to new rows/coluns to get same adicional information (The Yellow boxes). the items in black box is only text.

manuelmesquita_1-1644339160550.png

 

For Example I need to extrat for same cell's "Column2" the url link for same rows as example:

Each row  "Entidades concorrentes", there is a link, I  need to get the link's/url's:

"<td data-title="Entidades concorrentes"><a href="/Base4/pt/detalhe/?type=entidades&amp;id=76208">Arquijardim, S.A. (504818384)</a>,<br><a href="/Base4/pt/detalhe/?type=entidades&amp;id=8119">Lusosicó- Construções, S.A. (504870475)</a>,<br><a href="/Base4/pt/detalhe/?type=entidades&amp;id=67630">Penela Terraplanagens - Desaterros e Terraplanagens, Ldª (503954233)</a>,<br><a href="/Base4/pt/detalhe/?type=entidades&amp;id=100117">Tecnorem-Engenharia e Construções, SA (502519533)</a></td>"

 

Ref: Page:  https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8852980

My Atual Code: 

-----------------

let
Source = Web.BrowserContents("https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8852980"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".b-first"}, {"Column2", ".table:nth-child(1) > TBODY:nth-child(1):nth-last-child(1) > TR > :nth-child(2)"}}, [RowSelector=".b-first"]),
#"Replaced Value3" = Table.ReplaceValue(#"Extracted Table From Html","),",")),",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value3", {{"Column2", Splitter.SplitTextByDelimiter("),", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","Portugal",",|Portugal,",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",",,",",",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",",|P","|P",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value2", {{"Column2", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2")
in
#"Split Column by Delimiter1"

-----------------

 

If possible, I appreciate how to get the links. for each colun that have link.

 

Best Regards and Thanks.

Manuel mesquita

 

Hi, @manuelmesquita 

 

I don't know what the rules are in the links because I don't understand the language.

I searched for some ways, hope it helps:

Use Power BI Web From Example over multiple URLs 

Extracting Data from Complex Web Pages with Power BI - Excelerator BI

use power Query to extract hyperlinks from a web table (microsoft.com)

 

Best Regards,

Community Support Team _Janey

If this post helps, then please consider Accept it as the solution to help the other members find it more.

v-janeyg-msft
Community Support
Community Support

Hi, @manuelmesquita 

 

vjaneygmsft_0-1644202357053.png

I don't understand your needs very well, can you explain more?

vjaneygmsft_1-1644202594774.png

Is this what you want? ?

 

Janey

Yes, Is that that I want Janey, but the problem is to get the description and the URL for each column. after transpose to rows.

I only problem is for each group, example "Entidades concorrentes" the entity if different but it gives the same link. "id=76208", but the "id=" and html link’s are diferent's, the question how to get the correct html link for the correct entity.

manuelmesquita_1-1644669102078.png

I've reach the separation with the help of the post of "Chris Webb's BI Blog", "https://blog.crossjoin.co.uk/2018/08/30/power-bi-extract-urls-web-page/"

. I've use this code except that It comes with the same link for different entities.

 

This what I need.

Can you help?

Best Regards.

--------

let
Source = Web.BrowserContents("https://www.base.gov.pt/Base4/pt/detalhe/?type=contratos&id=8852980"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".b-first"}, {"Column2", ".table:nth-child(1) > TBODY:nth-child(1):nth-last-child(1) > TR > :nth-child(2)"},
{ "Link",
"a[href^=""/Base4/pt/detalhe/""]",
each [Attributes][href]}
}, [RowSelector=".b-first"]),
#"Replaced Value3" = Table.ReplaceValue(#"Extracted Table From Html","),",")),",Replacer.ReplaceText,{"Column2"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value3", {{"Column2", Splitter.SplitTextByDelimiter("),", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","Portugal",",|Portugal,",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",",,",",",Replacer.ReplaceText,{"Column2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",",|P","|P",Replacer.ReplaceText,{"Column2"})
in
#"Replaced Value2"

------------------------------

 

 

@manuelmesquita 

 

I have sent you the doc for getting the URLs, if you know the rule between the URLs you need.

 

As for why this method is useless, I don't know if you have tried changing it to http

 

vjaneygmsft_0-1644803703938.png

vjaneygmsft_2-1644803951330.png

 

If http is useless, it should be that the url is encrypted, you can see the base4...id or something behind it, I don't know html, I can only see this...binary code? Splitting the url after encryption doesn't seem to work.  So it should be the security of the website, not your code.

 

Best Regards,

Community Support Team _Janey

If this post helps, then please consider Accept it as the solution to help the other members find it more.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.