Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

Reply
rivthebest
Frequent Visitor

Extracting all the urls in multiple HTML table attributes in a web page using Power Query

Hi,

 

Here is the scenario on which I got stuck and seeking your valuable advice for the same.

 

Here is the github link containing category wise public API links:

https://github.com/public-apis/public-apis#test-data

 

It contains around 51 different categories listed as index at the beginning. As you scroll down the page you would find that each of the categories are presented in the HTML table format.

 

My objective is to fetch each of the API (url) under each of the topics along with the other table information and collated in the one single table.

 

To accomplish the task I have chosen Power Query utility and tried in Office365(Excel) and Power BI Desktop.

 

The challenge I had faced while executing the task:

 

Step1: Using the above github link I had reached until this:

rivthebest_0-1661452051572.png

 

Now if I expand the table then other than the link of each of the API for each category I am unable to capture. Then I tried to utilize the following code snippet posted in the Chris Webb's blog as the intermediate step to insert the url fetching code for each of the API for each of the categories. 

 

Chris Webb's Blog link: Chris Webb's BI Blog: Using Html.Table() To Extract URLs From A Web Page In Power BI/Power Query M C

 

And the portion of the code after making relevant modification is:

"Added Custom" = Table.AddColumn(Html.Table(Source, {{"Links", "a[href^=""http""]", each [Attributes][href]}}))

 

and later on by following this blog:

Power Query – how to simply get hyperlinks from webpages – Trainings, consultancy, tutorials (excelt...

 

#"Added Custom" = Table.AddColumn(Source, {{"API_URL", ":nth-last-child(155) > TBODY > TR > :nth-child(1) > A[rel=""nofollow""]:nth-child(1):nth-last-child(1)", each [Attributes][href]?}}, [RowSelector="TABLE:nth-child(20) > TBODY > TR"])

 

By making modifocation in the code by either of the ways, I could not be able to achieve the desired goal.

 

I had also opted to Add Table using Example option in the Power BI's Power Query Navigator Interface. Here is the screenshot.

 

rivthebest_1-1661452277625.png

But the problem is it only captures one category that is one HTML table not all the tables.

In this case I could not able to add the URL column with the rest of the datasets to accomplish the task. 

 

Please let me know if any other information you could need to recreate the steps. 

 

Please help me out

 

Regards

Ritabrata Bhattacharya(a.k.a Riv)

 

1 ACCEPTED SOLUTION

Hi Daryl,

I am extremely sorry for my delayed response and I sincerely apologize to you for that. Actually my system got crashed in between and it took some time for me to restore it back so I could not reply you on time for the same.

 

I utilized your code.. you are life saviour... Thanks a lot. Only a minor alteration I need to do... that is.... incorporating the Categories header in this dataset.

 

Thank you once again Daryl..

 

Regards Riv

View solution in original post

2 REPLIES 2
Daryl-Lynch-Bzy
Resident Rockstar
Resident Rockstar

Hi @rivthebest - is this what you are trying to get?

let
    Source = Web.Contents("https://github.com/public-apis/public-apis#test-data"),
    #"Text From Binary" = Text.FromBinary( Source ),
    #"Web Page" = Web.Page(Source),
    Data = #"Web Page"[Data],
    #"Table Combine" = Table.Combine( Data ),
    #"Add Start String" = Table.AddColumn(#"Table Combine", "Start String", each "<td><a href=""", type text),
    #"Add End String" = Table.AddColumn(#"Add Start String", "End String", each """ rel=""nofollow"">" & [API] & "</a></td>", type text),
    #"Added Index" = Table.AddIndexColumn(#"Add End String", "Index", 0, 1, Int64.Type),
    #"Add URL" = Table.AddColumn(#"Added Index", "href", each Text.BetweenDelimiters( #"Text From Binary" ,  [Start String] , [End String] , [Index])),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Add URL", {{"href", each Text.BeforeDelimiter(_, """>"), type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Extracted Text Before Delimiter",{"API", "Description", "Auth", "HTTPS", "CORS", "href"})
in
    #"Removed Other Columns"

Hi Daryl,

I am extremely sorry for my delayed response and I sincerely apologize to you for that. Actually my system got crashed in between and it took some time for me to restore it back so I could not reply you on time for the same.

 

I utilized your code.. you are life saviour... Thanks a lot. Only a minor alteration I need to do... that is.... incorporating the Categories header in this dataset.

 

Thank you once again Daryl..

 

Regards Riv

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors