Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Alex_Ooi
Helper IV
Helper IV

Web scraping to show all rows in Power Query

Hi all,

 

I have a task of scraping the composite index historical available in Yahoo Finance. Let's take an example of NASDAQ (^IXIC) with the following URL: https://finance.yahoo.com/quote/%5EIXIC/history?period1=1570147200&period2=1601769600&interval=1d&fi...

 

Start date: 4/10/2019

End date: 4/10/2020

 

However, I noticed a problem where by default, the URL only displays the first 100 rows. It will only be expanded when you scroll down in the website. As such, the result of my code only returns the earliest date of 13/5/2020 (while obviously expecting the earliest date in the table to be 4/10/2019).

Alex_Ooi_0-1601897154205.png


I am not familiar with inline CSS and unfortunately I was not able to understand most of the articles that I could find. I suspect that I could use the second optional argument in Web.BrowserContents( ) by using the WaitFor function. I have inspected the CSS element of the website and I noticed the only thing that changes before the table is fully displayed on web is the class element. I don't know how to add that into my code.

<html id="atomic" class="chrome desktop JsEnabled themelight layoutEnhance(TwoColumnLayout) CollapsibleUh onDemandFocusSupport hasScrolled scrollDown HideNavrail" lang="en-US">

 

I hope that someone here could help me. Below is my full M code.

let
    Source = Web.BrowserContents("https://finance.yahoo.com/quote/%5EIXIC/history?period1=1577836800&period2=1609372800&interval=1d&filter=history&frequency=1d"),
    #"Extracted Table From Html" = Html.Table(Source, {
        {"Column1",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column2",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column3",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column4",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column5",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column6",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column7",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2)
            + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(1),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2)
            + TD:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(1),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"}},
    [RowSelector="TABLE.W\(100\%\).M\(0\) > * > TR"]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
    #"Removed Columns" = Table.RemoveColumns(#"Promoted Headers",{"High", "Low", "Adj Close**"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Close*", "Close"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type date}, {"Open", type number}, {"Close", type number}, {"Volume", type number}}),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type", {"Date"}),
    #"Sorted Rows" = Table.Sort(#"Removed Errors",{{"Date", Order.Ascending}})
in
    #"Sorted Rows"

 Thank you

 

Regards,

Alex

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi Alex, notice the download link on the right!

 

= Csv.Document(Web.Contents("https://query1.finance.yahoo.com/v7/finance/download/%5EIXIC?period1=1570147200&period2=1601769600&i..."),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Give the above a shot and that should pull everything in!

View solution in original post

2 REPLIES 2
LiorRahav
Regular Visitor

Hi,  what if the page number is hidden in the metadata, for example: https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs

 , this will only give me the first 100 records... any thoughts?

Anonymous
Not applicable

Hi Alex, notice the download link on the right!

 

= Csv.Document(Web.Contents("https://query1.finance.yahoo.com/v7/finance/download/%5EIXIC?period1=1570147200&period2=1601769600&i..."),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Give the above a shot and that should pull everything in!

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.