Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
LiorRahav
Regular Visitor

Power Query web scrapping

Hi,  when connecting to a website, I want to pull more then the first page, what if the page number is hidden in the metadata, for example: https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs  .xml or .json, this will only give me the first 100 records... any thoughts?

2 REPLIES 2
LiorRahav
Regular Visitor

Thanks, very good article but I'm missing something..

 
I can't pull anything after the first page, maybe because the page is in the metadata or the [paging][next] part?
 
let
 iterations = 10,          // Number of iterations
 url = 
 
 FnGetOnePage =
  (url) as record =>
   let
    Source = Json.Document(Web.Contents(url)),
    data = try Source[data] otherwise null,
    next = try Source[paging][next] otherwise null,
    res = [Data=data, Next=next]
   in
    res,
 
 GeneratedList =
  List.Generate(
   ()=>[i=0, res = FnGetOnePage(url)],
   each [i]<iterations and [res][Data]<>null,
   each [i=[i]+1, res = FnGetOnePage([res][Next])],
   each [res][Data])
in
    GeneratedList
 
this is what I get:
 

 


 
as an fyi, this is the metadata:
let
    Source = Json.Document(Web.Contents("https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs")),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded metadata" = Table.ExpandRecordColumn(#"Converted to Table", "metadata", {"db_published_date", "elements_per_page", "current_url", "next_page_url", "total_elements", "total_pages", "current_page", "previous_page", "previous_page_url", "next_page"}, {"metadata.db_published_date", "metadata.elements_per_page", "metadata.current_url", "metadata.next_page_url", "metadata.total_elements", "metadata.total_pages", "metadata.current_page", "metadata.previous_page", "metadata.previous_page_url", "metadata.next_page"})
in
    #"Expanded metadata"
 
 

 

 
i hope you can help/ have time to help 🙂
lbendlin
Super User
Super User

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors