Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
Alex_Ooi
Helper IV
Helper IV

Web scraping to show all rows in Power Query

Hi all,

 

I have a task of scraping the composite index historical available in Yahoo Finance. Let's take an example of NASDAQ (^IXIC) with the following URL: https://finance.yahoo.com/quote/%5EIXIC/history?period1=1570147200&period2=1601769600&interval=1d&fi...

 

Start date: 4/10/2019

End date: 4/10/2020

 

However, I noticed a problem where by default, the URL only displays the first 100 rows. It will only be expanded when you scroll down in the website. As such, the result of my code only returns the earliest date of 13/5/2020 (while obviously expecting the earliest date in the table to be 4/10/2019).

Alex_Ooi_0-1601897154205.png


I am not familiar with inline CSS and unfortunately I was not able to understand most of the articles that I could find. I suspect that I could use the second optional argument in Web.BrowserContents( ) by using the WaitFor function. I have inspected the CSS element of the website and I noticed the only thing that changes before the table is fully displayed on web is the class element. I don't know how to add that into my code.

<html id="atomic" class="chrome desktop JsEnabled themelight layoutEnhance(TwoColumnLayout) CollapsibleUh onDemandFocusSupport hasScrolled scrollDown HideNavrail" lang="en-US">

 

I hope that someone here could help me. Below is my full M code.

let
    Source = Web.BrowserContents("https://finance.yahoo.com/quote/%5EIXIC/history?period1=1577836800&period2=1609372800&interval=1d&filter=history&frequency=1d"),
    #"Extracted Table From Html" = Html.Table(Source, {
        {"Column1",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column2",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column3",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column4",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column5",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column6",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column7",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2)
            + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(1),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2)
            + TD:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(1),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"}},
    [RowSelector="TABLE.W\(100\%\).M\(0\) > * > TR"]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
    #"Removed Columns" = Table.RemoveColumns(#"Promoted Headers",{"High", "Low", "Adj Close**"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Close*", "Close"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type date}, {"Open", type number}, {"Close", type number}, {"Volume", type number}}),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type", {"Date"}),
    #"Sorted Rows" = Table.Sort(#"Removed Errors",{{"Date", Order.Ascending}})
in
    #"Sorted Rows"

 Thank you

 

Regards,

Alex

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi Alex, notice the download link on the right!

 

= Csv.Document(Web.Contents("https://query1.finance.yahoo.com/v7/finance/download/%5EIXIC?period1=1570147200&period2=1601769600&i..."),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Give the above a shot and that should pull everything in!

View solution in original post

2 REPLIES 2
LiorRahav
Regular Visitor

Hi,  what if the page number is hidden in the metadata, for example: https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs

 , this will only give me the first 100 records... any thoughts?

Anonymous
Not applicable

Hi Alex, notice the download link on the right!

 

= Csv.Document(Web.Contents("https://query1.finance.yahoo.com/v7/finance/download/%5EIXIC?period1=1570147200&period2=1601769600&i..."),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Give the above a shot and that should pull everything in!

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.