Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
kkinuk
Regular Visitor

Extract HTML table from webpage

Hello, 

 

I need some help to extract a table from our company wiki page please. 

 

We have a page on wiki where we store some actions in a static html table. I want to present that data in PowerBI. Below are the things I explored so far

 

1. I saved the page as a html page locally on my C:/Temp folder. 

2. Using Web connector (to my local file) I was able to extract the table and the data within it. 

 

This made me realise that Power Query can extract the data I need from a web page. Now I wanted to connect PowerBI to the live version of the page on our wiki (built using Modern Template on Sharepoint). But I hit problems and below is what I found so far

1. I am not able to use web connector as PowerBI is saying that I cannot use organisation account and can only use anonymous, basic and windows authentication. Dont know why windows authentication is not working

2. Then i tried to create a blank query and write code manually 

 

Below is the code from the page from temp folder that Power Query extracted the table automatically

let
Source = File.Contents("c:/temp/actions.html"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6)"}, {"Column2", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5)"}, {"Column3", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4)"}, {"Column4", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3)"}, {"Column5", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD[rowspan=""1""]:not([colspan]):nth-child(5):nth-last-child(2)"}, {"Column6", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(1), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(1), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD[rowspan=""1""]:not([colspan]):nth-child(5):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(1)"}}, [RowSelector="TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", Int64.Type}, {"Date", type date}, {"Agenda Item", type text}, {"Description", type text}, {"Author", type text}, {"Decision", type text}})
in
#"Changed Type"

 

 

and Below is the code from the live version of the page that I tried to write using a blank query

let
Source = Web.Contents("https://oursharepoint site/sites/oursite/SitePages/actions.aspx"),
#"Extracted Table From Html" = Html.Table(Source, {
{"Column1", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6)"},
{"Column2", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5)"},
{"Column3", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4)"},
{"Column4", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3)"}, {"Column5", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD[rowspan=""1""]:not([colspan]):nth-child(5):nth-last-child(2)"}, {"Column6", "TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2) + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(1), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(1), TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(6) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(5) + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(4) + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(3) + TD[rowspan=""1""]:not([colspan]):nth-child(5):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(1)"}}, [RowSelector="TABLE.bandedRowTableStyleNeutral.ck-table-resized > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true])
in
#"Promoted Headers"

 

Can you tell me where I am going wrong please and if there is a way to fix the issue. If I can get to the page contents then it will be transformational and PowerBI can become our mainstream tool for governance in my team

2 REPLIES 2
kkinuk
Regular Visitor

Thank you. Sharepoints lists is my preferred option too. Unfortunately we are not allowed to use lists as a policy and it will take some time for the policy change to happen. This is my tactical solution until we are allowed to use Sharepoint lists. Is there no other way to solve my current problem please?

christinepayton
Super User
Super User

I would recommend just moving the data in the table to a SharePoint list. There is not a reason that I can think of to have data in an HTML table in SharePoint when you have MS Lists and a web part that will display them on a page. You'd be able to use the SP list as a source easily. 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.