Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Score big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount

Reply
Gringoh
New Member

Query to Scrap URLs with a pause every xx URLs

Hi,

 

I am using a query to scrap a list of data from various URLs of the same website.

 

I am using the web.content code to get those data e.g: (Json.Document(Web.Contents("https:// ….).

 

I think that my query is now overloading the website with too many queries too fast.

 

I tried to use the Function.InvokeAfter but then the query is really too slow to load.

 

I am wondering if it is possible to code a wait (pause or sleep) function once xx rows were loaded.

 

Let’s say my query has to scrap 1000 URLs would, it be possible to have a pause of x minutes every 100 URLs ?

 

I have already read a lot of useful tips, but nothing which could solde this issue.

 

Many thanks for any help.

Regards,

G

1 ACCEPTED SOLUTION

Hi,

 

Many thanks for your help Angelia.

 

It took me some time to manage to do what I want, but finally I did it thanks to the link shared by Angelia and a few things from here and there.

 

I am going to summarize the various steps in case it can help some of you 😉

 

01 – First query: getting data list to scrap, and setting wait parameters:

===============================================================

I have a first query using an api which collect a list of data used to scrap an URL of the same website.

==> So this query collect a list of data

==> then I add an index column to get the row count from 1 to xxx

==> then I use a modulo formula to set a pause in the query every xx rows.

 

Here the sample of this first query:

 

let
    Source = Json.Document(Web.Contents("https://api.xxxxxxx")),
…….
    #"Index ajouté" = Table.AddIndexColumn(#"Column1 développé", "Index", 1, 1),

    Row_Mod_Multiple = 3,

    #"Personnalisée ajoutée" = Table.AddColumn(#"Index ajouté", "Wait", each if Number.Mod([Index], Row_Mod_Multiple) = 0 then "yes" else "no")
in
    #"Personnalisée ajoutée"

Thanks to the modulo formula with a row multiple of 3, every 3 rows the query will wait xxx seconds (or minutes) before to proceed with the next row.

 

Q1.png

 

 

02 – Second query: wait formula:

===============================================================

This second query is the wait function (this query is used to add a wait time of xxx seconds), make sure to name it “Wait_Seconds”:

 

let
    Wait = (seconds as number, action as function) =>
            if (List.Count(List.Generate(() => DateTimeZone.LocalNow() + #duration(0,0,0,seconds), (x) => DateTimeZone.LocalNow() < x, (x) => x)) = 0)
            then null else action()
in
    Wait

This wait function will be used in the third query.

 

 

03 – Third query: scrap urls and wait every xxx rows:

===============================================================

 

let
    Source = #"Query1",

    #"Personnalisée ajoutée" = Table.AddColumn(Source, "id", each if [Wait] = "yes" then Wait_Seconds(5, () => fnGetdata([URL values]) ) else fnGetdata ([URL values]) ),
……

This last query will scrap the list of URLs and will wait 5 seconds every 3 rows.

 

As some website are blocking IP address which are making too many requests per minutes, this approach can help to avoid the issues.

 

Hope it can help 😉

Best,

G

View solution in original post

2 REPLIES 2
v-huizhn-msft
Microsoft Employee
Microsoft Employee

Hi @Gringoh,

Please reference to the link: https://blog.crossjoin.co.uk/2014/11/17/timing-power-query-queries/. And you'd better post Power Query question to corresponding forum: Power Query Forum.

Thanks,
Angelia

Hi,

 

Many thanks for your help Angelia.

 

It took me some time to manage to do what I want, but finally I did it thanks to the link shared by Angelia and a few things from here and there.

 

I am going to summarize the various steps in case it can help some of you 😉

 

01 – First query: getting data list to scrap, and setting wait parameters:

===============================================================

I have a first query using an api which collect a list of data used to scrap an URL of the same website.

==> So this query collect a list of data

==> then I add an index column to get the row count from 1 to xxx

==> then I use a modulo formula to set a pause in the query every xx rows.

 

Here the sample of this first query:

 

let
    Source = Json.Document(Web.Contents("https://api.xxxxxxx")),
…….
    #"Index ajouté" = Table.AddIndexColumn(#"Column1 développé", "Index", 1, 1),

    Row_Mod_Multiple = 3,

    #"Personnalisée ajoutée" = Table.AddColumn(#"Index ajouté", "Wait", each if Number.Mod([Index], Row_Mod_Multiple) = 0 then "yes" else "no")
in
    #"Personnalisée ajoutée"

Thanks to the modulo formula with a row multiple of 3, every 3 rows the query will wait xxx seconds (or minutes) before to proceed with the next row.

 

Q1.png

 

 

02 – Second query: wait formula:

===============================================================

This second query is the wait function (this query is used to add a wait time of xxx seconds), make sure to name it “Wait_Seconds”:

 

let
    Wait = (seconds as number, action as function) =>
            if (List.Count(List.Generate(() => DateTimeZone.LocalNow() + #duration(0,0,0,seconds), (x) => DateTimeZone.LocalNow() < x, (x) => x)) = 0)
            then null else action()
in
    Wait

This wait function will be used in the third query.

 

 

03 – Third query: scrap urls and wait every xxx rows:

===============================================================

 

let
    Source = #"Query1",

    #"Personnalisée ajoutée" = Table.AddColumn(Source, "id", each if [Wait] = "yes" then Wait_Seconds(5, () => fnGetdata([URL values]) ) else fnGetdata ([URL values]) ),
……

This last query will scrap the list of URLs and will wait 5 seconds every 3 rows.

 

As some website are blocking IP address which are making too many requests per minutes, this approach can help to avoid the issues.

 

Hope it can help 😉

Best,

G

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.