Re: PowerQuery Post API call to return complete da...

hanuraolm · ‎02-15-2023

A search of the frontend application returned 21891 records using 219 pages. Using PowerQuery's POST API, we are trying to call this data

PowerQuery -

3 - Quries

Tool Utilization Query - below is the code I'm using to return the results and it's returning only first 100 records. How to loop this code to get complete 21819 records from 219 pages.

let
    apiUrl = "https://abc.net/dfd/v1/unrestricted-context-search?userId=rtr",

    bearerToken = getbearerToken(),
    options = [
        Content = Text.ToBinary(requestBody),
        Headers = [
            #"Content-Type"="application/json"

        ]
    ],
    apiResponse = Web.Contents(apiUrl, options),
#"Imported JSON" = Json.Document(apiResponse,1252),
docs = #"Imported JSON"[docs]

in
#"Expanded Column1

hanuraolm · ‎03-05-2023

Let's assume that in the particular search criteria my API returns 60,000 records and my API call will always allow 1000 records. Due to the sequential pagination logic, it took longer to load the entire data set. In the example attached, I can read the complete data. Let's say we have 60000 records and each API call will support 1000 records, so the total number of pages is 60000/1000=60 pages. If each 1000 records took 4 seconds to load, then it will take 240 seconds to load all the data. How can the dataload be made fast through parallel mechanisms.

ams1 · ‎03-06-2023

Hi @hanuraolm ,

I see the problem.

There is also a similar post regaring this: https://community.powerbi.com/t5/Power-Query/Power-Query-Parallel-or-Concurrent-REST-Invocation/td-p...

Unfortunately there it states that "Parellel operations are not permitted within a single query" - not sure if it's still actual.

I would still try below potential solution (adding a column with starts and a second column to get the data), BUT if that doesn't work, as suggested also by the OP of the above question, I'd create an intermediery (local python, custom software hosted in a service etc.) that would make the requests in paralell and make them accessible to PowerQuery somehow.

Below potential solution 😊

Given that you know the start of all pages immediately after you run the first request, one option could be to FIRST create a column with the start of all pages, like this:

and THEN add a column that invokes getPageData using the above start column instead of List.Generate - maybe this will help:

The code to do this could look like:

    ... your code

    maxPages = Number.RoundUp(totalRecords/1000),
    pagesStartList = List.Generate(
        () => 0,
        (lastPage) => lastPage < maxPages,
        (lastPage) => lastPage + 1,
        (lastPage) => lastPage * 1000
    ),
    pagesStartTable = Table.FromColumns({pagesStartList}, {"start"}),
    // Define a function to retrieve data for a single page
    getPageData = (start as number) =>
        let
            ... your code
        in
            data,

    #"Added Custom" = Table.AddColumn(pagesStartTable, "pageData", each getPageData([start]))
in 
    #"Added Custom"

I haven't tested it, but if it works, please do tell us if it did AND also mark this as ANSWER if it helped.

ams1 · ‎02-16-2023

Hi

Looping is done using List.Generate.

I recently answered some questions related to paging:

https://community.powerbi.com/t5/Power-Query/how-to-iterate-the-data-of-several-sheets-of-a-web-page...
and also you can have a look at https://learn.microsoft.com/en-us/power-query/helper-functions#tablegeneratebypage

If you cannot manage to fix it using the above links, please reply and I'll have a look at your code.

Also it will be important to know where is the next page token/number in the API response and how to request the next page.

Please mark this as answer if it helped.

hanuraolm · ‎02-16-2023

i could see Authorization: Bearer , kvAccessToken is same for all the pages.

when i click on the next page --

{"query":"((end_time ge '2023-01-18T05:28:31Z' and start_time le '2023-02-17T05:28:31Z') and (file_type_name eq 'datalog'))","fields":[],"count":true,"start":100,"limit":100,"sort":{"fieldname":"end_time","order":"desc"}}

Next page ----

{"query":"((end_time ge '2023-01-18T05:30:02Z' and start_time le '2023-02-17T05:30:02Z') and (file_type_name eq 'datalog'))","fields":[],"count":true,"start":200,"limit":100,"sort":{"fieldname":"end_time","order":"desc"}}

seems "start" is the changed every time, then "limit":100 every time.

these days are same for all page, because i choose last 30 days in application...

end_time ge '2023-01-18T05:28:31Z' and start_time le '2023-02-17T05:28:31Z'

hanuraolm · ‎02-16-2023

Your help in sharing the code structure would be greatly appreciated.

ams1 · ‎02-17-2023

Hi,

Should be something like below - it assumes that we don't know the number of available pages (do we know from the response how many pages are left?) and therefore will keep incrementing "start = start + limit" until we get an error.

Hopefully your server will raise an error if we request a start that doesn't exist, otherwise it won't stop requesting.

If below doesn't work, please get back here and we'll finish it.

Please mark this as answer if it helped.

let
    apiUrl = "http://...",
    bearerToken = getbearerToken(),
    getkvAccessToken = getkvAccessToken(),
    headers = [
        #"Content-Type" = "application/json",
        #"Authorization" = "Bearer " & bearerToken,
        #"clientAppName" = "Voyager",
        #"clientFeatureName" = "Search",
        #"kvAccessToken" = getkvAccessToken,
        #"Ocp-Apim-Subscription-Key" = "ec424b08a1a14b7b97e2e9f6138767be"
    ],
    limit = 100,
    runNextRequest = (lastRequest) =>
        let
            getResponse = (startRow) =>
                let
                    requestBodyRecord = [
                        query = "((end_time ge '2022-12-31T12:01:33Z' and start_time le '2023-01-30T12:01:33Z') and search.ismatch('/.*datalog.*/', 'file_type_name', 'full', 'any'))",
                        fields = {},
                        count = true,
                        start = startRow,
                        limit = limit,
                        sort = [
                            fieldname = "end_time",
                            order = "desc"
                        ]
                    ],
                    options = [
                        Content = Json.FromValue(requestBodyRecord),
                        Headers = headers
                    ],
                    apiResponse = Json.Document(Web.Contents(apiUrl, options))
                in
                    apiResponse,
            start = if lastRequest = null then 0 else lastRequest[start] + limit,
            response = getResponse(start),
            hasError = (try response)[HasError] = true,
            currentRequest =
            // in the absence of anything in the response to tell us when to stop,
            // we will continue requesting as long as we don't get an error
            if hasError = false then [
                response = response,
                start = start
            ] else null
        in
            currentRequest,
    allPages = List.Generate(
        // initial
        () => runNextRequest(null),
        // condition
        (requestResponse) => requestResponse <> null,
        //next
        (requestResponse) => runNextRequest(requestResponse),
        //selector
        // TODO: add [docs] at the end of the line below to extract just docs
        (requestResponse) => requestResponse[response]
    )
in
    allPages

hanuraolm · ‎02-21-2023

Thanks for your Inputs, I tried the above code given, but it is loading very slowly record by record.

Tried with the below chat gpt code and getting repeated data(1st 100 records) until the completion of the loop, can somebody please help me to capture all data.

let
apiUrl = "https://sample url/username",
//startDateTime = DateTime.ToText(DateTime.LocalNow(), "yyyy-MM-ddThh:mm:ssZ"),
//endDateTime = DateTime.ToText( DateTime.LocalNow() - #duration(60, 0, 0, 0), "yyyy-MM-ddThh:mm:ssZ" ),
startDateTime = "2023-02-21T07:19:22Z",
endDateTime ="2023-01-22T07:19:22Z",
requestBody = "{""query"":""((((end_time ge '"
& endDateTime
& "' and start_time le '"
& startDateTime
& "')
options = [
Headers = [
#"Content-Type" = "application/json",
#"Authorization" = "Bearer " & bearerToken,
##"kvAccessToken" = getkvAccessToken(),
]
],
apiResponse = Web.Contents(apiUrl, options & [Content = Text.ToBinary(requestBody)]),
#"Imported JSON" = Json.Document(apiResponse, 1252),
docs = #"Imported JSON"[docs],
totalDocs = #"Imported JSON"[numFound],
allDocs = List.Generate(
() => [start = 0, accumulatedDocs = {}],
each [start] < totalDocs,
each [start = [start] + 100, accumulatedDocs = List.Combine({[accumulatedDocs], [docs]})],
each
let
optionsWithContent = Record.AddField(
options,
"Content",
Text.ToBinary(Text.Replace(requestBody, "#start#", Text.From([start])))
),
response = Web.Contents(apiUrl, optionsWithContent),
json = Json.Document(response, 1252)

in
[docs = json[docs]]
),
combinedDocs = List.Combine(List.Transform(allDocs, each _[docs])),
#"Converted to Table" = Table.FromList(combinedDocs, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "start_time"}, {"id", "start_time"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Column1", each ([id] = "0055c4e0-027d-4883-b5cc-6ec9d4731fcf"))

in
#"Filtered Rows"

ams1 · ‎02-21-2023

Hi,

When you're trying to build on a not-so-solid foundation, if you try to build and your building collapses, IMO you need to get back to making the foundation solid.

In our case, there were some missing pieces of the puzzle (ex. how to get the total number of pages), we gave it a shot and it didn't work -> so in order for me to continue helping, I'll need you to please provide some more clarity so that we have a solid foundation.

More than this, you also tried and did not succeed with the Chat GPT solution -> 2 x collapses.

One of the most solid foundations for connecting to an API is Postman/curl -> using a browser is NOT necessarily a solid foundation (seeing you write "when i click on the next page" leads me to believe you're using a browser to test, right?).

It should take you like ~30 mins (or maybe less) to get Postman setup and connect to the API.

After you have Postman setup, my advice is to make a smaller query (ex. 3 "pages") and try to get all of them.

An important missing piece of information for me is how will PowerQuery know when there are no more pages (?).

Ex in the case you have 3 pages (aka: start=0, start=100 and start=200), if you request the 4th page (aka start=400), will the API raise an error, OR there is something in the response json that tells us the number of pages?

Also I'd like to see how the response json from Postman looks like - if you post it here, DON'T forget to remove confidential information.

So if you're willing to invest in providing a solid foundation, I'm willing to invest in replying.

P.S.: looking at the ChatGPT solution I see you have "numFound" in the response -> which you didn't specify in the initial question. 😉

PowerQuery Post API call to return complete data from all pages.

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

PowerQuery Post API call to return complete data from all pages.

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025