Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
hanuraolm
Helper I
Helper I

PowerQuery Post API call to return complete data from all pages.

A search of the frontend application returned 21891 records using 219 pages. Using PowerQuery's POST API, we are trying to call this data

hanuraolm_2-1676533449467.png

PowerQuery - 

3 - Quries

hanuraolm_0-1676533256470.png

Tool Utilization Query  - below is the code I'm using to return the results and it's returning only first 100 records. How to loop this code to get complete 21819 records from 219 pages.

let
    apiUrl = "https://abc.net/dfd/v1/unrestricted-context-search?userId=rtr",

    bearerToken = getbearerToken(),
    options = [
        Content = Text.ToBinary(requestBody),
        Headers = [
            #"Content-Type"="application/json"
    

        ]
    ],
    apiResponse = Web.Contents(apiUrl, options),
#"Imported JSON" = Json.Document(apiResponse,1252),
docs = #"Imported JSON"[docs]


in
#"Expanded Column1

 

8 REPLIES 8
hanuraolm
Helper I
Helper I

sample test.png

 

Let's assume that in the particular search criteria my API returns 60,000 records and my API call will always allow 1000 records. Due to the sequential pagination logic, it took longer to load the entire data set. In the example attached, I can read the complete data. Let's say we have 60000 records and each API call will support 1000 records, so the total number of pages is 60000/1000=60 pages. If each 1000 records took 4 seconds to load, then it will take 240 seconds to load all the data. How can the dataload be made fast through parallel mechanisms.

ams1
Responsive Resident
Responsive Resident

Hi @hanuraolm ,

 

I see the problem.

 

There is also a similar post regaring this: https://community.powerbi.com/t5/Power-Query/Power-Query-Parallel-or-Concurrent-REST-Invocation/td-p...

 

Unfortunately there it states that "Parellel operations are not permitted within a single query" - not sure if it's still actual.

 

I would still try below potential solution (adding a column with starts and a second column to get the data), BUT if that doesn't work, as suggested also by the OP of the above question, I'd create an intermediery (local python, custom software hosted in a service etc.) that would make the requests in paralell and make them accessible to PowerQuery somehow.

 

Below potential solution 😊

Given that you know the start of all pages immediately after you run the first request, one option could be to FIRST create a column with the start of all pages, like this:

ams1_0-1678092749431.png

and THEN add a column that invokes getPageData using the above start column instead of List.Generate - maybe this will help:

ams1_1-1678092927855.png

 

The code to do this could look like:

    ... your code

    maxPages = Number.RoundUp(totalRecords/1000),
    pagesStartList = List.Generate(
        () => 0,
        (lastPage) => lastPage < maxPages,
        (lastPage) => lastPage + 1,
        (lastPage) => lastPage * 1000
    ),
    pagesStartTable = Table.FromColumns({pagesStartList}, {"start"}),
    // Define a function to retrieve data for a single page
    getPageData = (start as number) =>
        let
            ... your code
        in
            data,

    #"Added Custom" = Table.AddColumn(pagesStartTable, "pageData", each getPageData([start]))
in 
    #"Added Custom"

 

I haven't tested it, but if it works, please do tell us if it did AND also mark this as ANSWER if it helped.

ams1
Responsive Resident
Responsive Resident

Hi


Looping is done using List.Generate.

 

I recently answered some questions related to paging:

 

If you cannot manage to fix it using the above links, please reply and I'll have a look at your code.

 

Also it will be important to know where is the next page token/number in the API response and how to request the next page.

 

Please mark this as answer if it helped.

i could see Authorization: Bearer , kvAccessToken is same for all the pages.

when i click on the next page --

{"query":"((end_time ge '2023-01-18T05:28:31Z' and start_time le '2023-02-17T05:28:31Z') and (file_type_name eq 'datalog'))","fields":[],"count":true,"start":100,"limit":100,"sort":{"fieldname":"end_time","order":"desc"}}

Next page ----

{"query":"((end_time ge '2023-01-18T05:30:02Z' and start_time le '2023-02-17T05:30:02Z') and (file_type_name eq 'datalog'))","fields":[],"count":true,"start":200,"limit":100,"sort":{"fieldname":"end_time","order":"desc"}}

 

seems "start" is the changed every time, then "limit":100 every time.

 

these days are same for all page, because i choose last 30 days in application...

end_time ge '2023-01-18T05:28:31Z' and start_time le '2023-02-17T05:28:31Z'

 

Your help in sharing the code structure would be greatly appreciated.

ams1
Responsive Resident
Responsive Resident

Hi,

 

Should be something like below - it assumes that we don't know the number of available pages (do we know from the response how many pages are left?) and therefore will keep incrementing "start = start + limit" until we get an error.

 

Hopefully your server will raise an error if we request a start that doesn't exist, otherwise it won't stop requesting.

 

If below doesn't work, please get back here and we'll finish it.

 

Please mark this as answer if it helped.

 

 

let
    apiUrl = "http://...",
    bearerToken = getbearerToken(),
    getkvAccessToken = getkvAccessToken(),
    headers = [
        #"Content-Type" = "application/json",
        #"Authorization" = "Bearer " & bearerToken,
        #"clientAppName" = "Voyager",
        #"clientFeatureName" = "Search",
        #"kvAccessToken" = getkvAccessToken,
        #"Ocp-Apim-Subscription-Key" = "ec424b08a1a14b7b97e2e9f6138767be"
    ],
    limit = 100,
    runNextRequest = (lastRequest) =>
        let
            getResponse = (startRow) =>
                let
                    requestBodyRecord = [
                        query = "((end_time ge '2022-12-31T12:01:33Z' and start_time le '2023-01-30T12:01:33Z') and search.ismatch('/.*datalog.*/', 'file_type_name', 'full', 'any'))",
                        fields = {},
                        count = true,
                        start = startRow,
                        limit = limit,
                        sort = [
                            fieldname = "end_time",
                            order = "desc"
                        ]
                    ],
                    options = [
                        Content = Json.FromValue(requestBodyRecord),
                        Headers = headers
                    ],
                    apiResponse = Json.Document(Web.Contents(apiUrl, options))
                in
                    apiResponse,
            start = if lastRequest = null then 0 else lastRequest[start] + limit,
            response = getResponse(start),
            hasError = (try response)[HasError] = true,
            currentRequest =
            // in the absence of anything in the response to tell us when to stop,
            // we will continue requesting as long as we don't get an error
            if hasError = false then [
                response = response,
                start = start
            ] else null
        in
            currentRequest,
    allPages = List.Generate(
        // initial
        () => runNextRequest(null),
        // condition
        (requestResponse) => requestResponse <> null,
        //next
        (requestResponse) => runNextRequest(requestResponse),
        //selector
        // TODO: add [docs] at the end of the line below to extract just docs
        (requestResponse) => requestResponse[response]
    )
in
    allPages

 

 

Thanks for your Inputs, I tried the above code given, but it is loading very slowly record by record.

Tried with the below chat gpt code and getting repeated data(1st 100 records) until the completion of the loop, can somebody please help me to capture all data.

 

let
apiUrl = "https://sample url/username",
//startDateTime = DateTime.ToText(DateTime.LocalNow(), "yyyy-MM-ddThh:mm:ssZ"),
//endDateTime = DateTime.ToText( DateTime.LocalNow() - #duration(60, 0, 0, 0), "yyyy-MM-ddThh:mm:ssZ" ),
startDateTime = "2023-02-21T07:19:22Z",
endDateTime ="2023-01-22T07:19:22Z",
requestBody = "{""query"":""((((end_time ge '"
& endDateTime
& "' and start_time le '"
& startDateTime
& "') 
options = [
Headers = [
#"Content-Type" = "application/json",
#"Authorization" = "Bearer " & bearerToken,
##"kvAccessToken" = getkvAccessToken(),
]
],
apiResponse = Web.Contents(apiUrl, options & [Content = Text.ToBinary(requestBody)]),
#"Imported JSON" = Json.Document(apiResponse, 1252),
docs = #"Imported JSON"[docs],
totalDocs = #"Imported JSON"[numFound],
allDocs = List.Generate(
() => [start = 0, accumulatedDocs = {}],
each [start] < totalDocs,
each [start = [start] + 100, accumulatedDocs = List.Combine({[accumulatedDocs], [docs]})],
each
let
optionsWithContent = Record.AddField(
options,
"Content",
Text.ToBinary(Text.Replace(requestBody, "#start#", Text.From([start])))
),
response = Web.Contents(apiUrl, optionsWithContent),
json = Json.Document(response, 1252)

in
[docs = json[docs]]
),
combinedDocs = List.Combine(List.Transform(allDocs, each _[docs])),
#"Converted to Table" = Table.FromList(combinedDocs, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"id", "start_time"}, {"id", "start_time"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Column1", each ([id] = "0055c4e0-027d-4883-b5cc-6ec9d4731fcf"))

in
#"Filtered Rows"

ams1
Responsive Resident
Responsive Resident

Hi,

 

When you're trying to build on a not-so-solid foundation, if you try to build and your building collapses, IMO you need to get back to making the foundation solid.

 

In our case, there were some missing pieces of the puzzle (ex. how to get the total number of pages), we gave it a shot and it didn't work -> so in order for me to continue helping, I'll need you to please provide some more clarity so that we have a solid foundation.

More than this, you also tried and did not succeed with the Chat GPT solution -> 2 x collapses.

 

One of the most solid foundations for connecting to an API is Postman/curl -> using a browser is NOT necessarily a solid foundation (seeing you write "when i click on the next page" leads me to believe you're using a browser to test, right?)

 

It should take you like ~30 mins (or maybe less) to get Postman setup and connect to the API.

After you have Postman setup, my advice is to make a smaller query (ex. 3 "pages") and try to get all of them.

An important missing piece of information for me is how will PowerQuery know when there are no more pages (?). 

Ex in the case you have 3 pages (aka: start=0, start=100 and start=200), if you request the 4th page (aka start=400), will the API raise an error, OR there is something in the response json that tells us the number of pages?

Also I'd like to see how the response json from Postman looks like - if you post it here, DON'T forget to remove confidential information.

 

So if you're willing to invest in providing a solid foundation, I'm willing to invest in replying.

 

P.S.: looking at the ChatGPT solution I see you have "numFound" in the response -> which you didn't specify in the initial question. 😉

 

 

 

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.