Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
I was following along to an exercise to 'Web scrape with Power Query' (Link). It involves using a formula in Power Query to import 'Major League Base Detailed Standings' table data from the 'baseball-reference' website for different years (by changing the YYYY in the URL from 2000 to 2020).
Steps:
1. Create a query with [Years] column with values from 2000 - 2020
2. Create a second query with the m-code below. It references Table ID 7 in the HTML which should correspond to the 'Major League Base Detailed Standings' table. @9:20 mark
Code
= (year as number) =>
let
Source = Web.Page(Web.Contents("https://www.baseball-reference.com/leagues/majors/"& Text.From(year) & "-standings.shtml")),
Data = Source{7}[Data] //Seventh table on the page with ID starting at zero @9.20 min
in
Data
3. Add a "Invoke Custom Function" column in the first query (years column), using the year as the input to the second query.
This step returns Error values against each year:
I changed the table ID in Data = Source{X}[Data] above. X=1-->5 all work fine so I assume it's due to an incorrect Table ID.
What m-code would return the 'Major League Base Detailed Standings' table from the Link?
Also what changed in the HTML that caused the m-code above to stop working?
Thanks in advance!
Solved! Go to Solution.
Hi @RokuCap
Let me check, give me 10 minutes. A quick look at the web page structure shows me that the actual table of data you want is in HTML comments, meaning the web browser wouldn't display it.
So the fact we are seeing it means there's something else going on like maybe JavaScript is reading the table and rendering it to screen. Unusual.
In the meantime I've created this example PBIX file that loads the data you want via a function.
Phil
Proud to be a Super User!
Hi @RokuCap
It doesn't look like you can use the 'old' way of referring to the table because as I said, the table you want isn't visible to the PQ when loaded using Web.Page(Web.Contents()) - what it sees is this
So the 7th table doesn't exist - hence the error you are seeing.
The reason the table you want is 'hidden' is because it's in a section of HTML comments.
However by using the Web.BrowserContents function in the query I created, PQ can parse the the web page code to find the table.
Regards
Phil
Proud to be a Super User!
Hi @RokuCap
Let me check, give me 10 minutes. A quick look at the web page structure shows me that the actual table of data you want is in HTML comments, meaning the web browser wouldn't display it.
So the fact we are seeing it means there's something else going on like maybe JavaScript is reading the table and rendering it to screen. Unusual.
In the meantime I've created this example PBIX file that loads the data you want via a function.
Phil
Proud to be a Super User!
Hi Phil,
Your PBIX file works perfectly in edit mode but I get the following error when I try to load the data. Do you get the same errors? It loads really slowly to the point of timing out. I assumed it had something to do with the source website or even my internet connection.
Thanks again
Hi @RokuCap
I imagine the structure of the webpage has chnaged so the reference to the table you want has changed.
To test this you can create a new blank query and load this URL
https://www.baseball-reference.com/leagues/majors/2020-standings.shtml
PQ presents you with this
Check the table you want and click OK to load it.
Here's the query to get that table
let
Source = Web.BrowserContents("https://www.baseball-reference.com/leagues/majors/2020-standings.shtml"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(1)"}, {"Column2", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(2)"}, {"Column3", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(3)"}, {"Column4", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(4)"}, {"Column5", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(5)"}, {"Column6", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(6)"}, {"Column7", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(7)"}, {"Column8", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(8)"}, {"Column9", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(9)"}, {"Column10", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(10)"}, {"Column11", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(11)"}, {"Column12", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(12)"}, {"Column13", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(13)"}, {"Column14", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(14)"}, {"Column15", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(15)"}, {"Column16", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(16)"}, {"Column17", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(17)"}, {"Column18", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(18)"}, {"Column19", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(19)"}, {"Column20", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(20)"}, {"Column21", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(21)"}, {"Column22", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(22)"}, {"Column23", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(23)"}, {"Column24", "TABLE[id='expanded_standings_overall'] > * > TR > :nth-child(24)"}}, [RowSelector="TABLE[id='expanded_standings_overall'] > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Rk", Int64.Type}, {"Tm", type text}, {"W", Int64.Type}, {"L", Int64.Type}, {"W-L%", type number}, {"R", type number}, {"RA", type number}, {"Rdiff", type number}, {"SOS", type number}, {"SRS", type number}, {"pythWL", type text}, {"Luck", Int64.Type}, {"vEast", type text}, {"vCent", type text}, {"vWest", type text}, {"Inter", type text}, {"Home", type text}, {"Road", type text}, {"ExInn", type text}, {"1Run", type text}, {"vRHP", type text}, {"vLHP", type text}, {"≥.500", type text}, {"<.500", type text}})
in
#"Changed Type"
You'll need to take the relevant parts of that query and put them into your function.
Regards
Phil
Proud to be a Super User!
Thanks Phil,
Is the query you posted the only way to reference this table? I.e. there is no longer a table ID associated with that table or something that could be easily referenced within the formula below?
Source{7}[Data]
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
26 | |
12 | |
11 | |
11 | |
8 |
User | Count |
---|---|
53 | |
28 | |
16 | |
14 | |
13 |