Solved: How can I use an API to pull in more than 1000 row...

Justin326117 · ‎04-28-2025

Hi,

I am trying to use an API to pull in data into powerquery, I won't go into details on other aspects. The issue is that I have to pull in more than 1000 rows, and powerquery seems to give me error messages when I go past a limit of 1000?

I have tried using pagination techniques, and it oddly sends an error message on row 26,000?

Does anyone have any advice on what I should do here? If I import the data will more than 1000 rows be sent through?

Or would I need multiple data pulls, where for each one I would have to skip 1000, than 2000

For transparency, I am looking to have this process automated, so would not want to have to manage this manually, I do not think I get a url in my API calls which can be used to help pagination, so I have to use the skip and limit technique

MarkLaf · ‎04-29-2025

As you note, https://open.fda.gov/apis/paging/ documents that you can't skip more than 25k, and instead have to use the alternate search-after parameter to page higher. The doc notes that, to get the correct search-after parameter that gets you the next page of data, you have to use the "Link HTTP header contained in the response."

As far as I can tell, there is no way to expose the response Link header without creating a custom connector, as noted here: https://community.fabric.microsoft.com/t5/Desktop/Retrieve-API-response-headers/m-p/641107/highlight...

Fortunately, it's not too difficult to put together a custom connector (especially when you don't need to worry about authentication). I was able to create one with Visual Studio Code following the instructions at: https://learn.microsoft.com/en-us/power-query/install-sdk and https://learn.microsoft.com/en-us/power-query/creating-first-connector

There are a bunch of ways you could implement it, but I used the following for the main Contents function of the main pq definition file (I went with custom connector name, OpenFDA_DeepPaging):

shared OpenFDA_DeepPaging.Contents =
    let
        func = (baseURL as text, optional relativePath as text) as record =>
            let
                Source = Web.Contents(baseURL, [RelativePath = relativePath]),
                Headers = Value.Metadata(Source)[Headers],
                Results = Table.FromRecords(Json.Document(Source)[results], null, MissingField.UseNull),
                Return = [Headers = Headers, Results = Results]
            in
                Return
    in
        Value.ReplaceType(
            func, type function (baseURL as text, optional relativePath as text) as [
                Headers = record, Results = table
            ]
        );

This parses the main data returned (in Results step), exposes all response headers via Value.Metadata (Headers step), and then outputs both in a record (Return step).

Again, basically the only reason we need to do this in a custom connector rather than in PBI desktop is that we can access all the response headers with Value.Metadata; in PBI desktop, the same exact code (as well as Web.Headers) only returns a subset, which does not include Link.

I didn't really worry about api key handling as, per my understanding of the authentication doc, you can just include it as your first query parameter. No need to pass as a request header:

Using the above custom connector, the following M successfully pages through all applicable records given the limit provided.

// Test with: https://api.fda.gov/device/510k.json?search=openfda.regulation_number:868.5895&limit=100
// As of 2025-04-29, has 590 total records. With limit=100, we should page through 6 web calls
let
    SourceUrl = "https://api.fda.gov",
    GetAll = List.Generate(
        () =>
            OpenFDA_DeepPaging.Contents(
                SourceUrl, 
                "device/510k.json?search=openfda.regulation_number:868.5895&limit=100"
            ),
        each _ <> null,
        each
            if [Headers][Link]? = null then
                null
            else
                let
                    nextFullCall = Text.BetweenDelimiters([Headers][Link], "<", ">; rel"),
                    endpointAndQuery = Text.AfterDelimiter(nextFullCall, SourceUrl & "/")
                in
                    OpenFDA_DeepPaging.Contents(SourceUrl, endpointAndQuery),
        each [Results]
    ),
    CombineAll = Table.Combine(GetAll)
in
    CombineAll

SDK extension project is here if interested: https://github.com/MarkRasp/OpenFDA_DeepPaging

View solution in original post

v-dineshya · ‎04-29-2025

Hi @Justin326117 ,

Thank you for reaching out to the Microsoft Community Forum.

Please follow below steps:

1. Use a paginated loop in Power Query (M): You can automate the entire data retrieval process in Power Query using a custom function that pages through the API using skip and limit, then combines the results into a single table.

Paginated API call using skip and limit in Power Query

M code:

let
PageSize = 1000,

GetPage = (PageNumber as number) =>
let
Skip = PageNumber * PageSize,
Source = Json.Document(Web.Contents("https://api.example.com/data",
[
Query = [
limit = Text.From(PageSize),
skip = Text.From(Skip)
]
])),
Data = Source[data] -- Adjust if your API structure differs
in
Data,

GetAllPages = List.Generate(
() => [Page = 0, Data = GetPage(0)],
each List.Count([Data]) > 0,
each [Page = [Page] + 1, Data = GetPage([Page] + 1)],
each [Data]
),

Flattened = List.Combine(GetAllPages),
TableOut = Table.FromList(Flattened, Record.FieldValues, {"ColumnName1", "ColumnName2"}) -- Adjust as needed
in
TableOut

Please check below things:

1. API limits: Check if the API has a hard limit on how far skip can go some APIs don't handle high offsets well (like 26,000+), and instead prefer cursor-based pagination.

2. Error handling: Add try...otherwise inside GetPage to catch and skip over any problematic page loads.

3. Performance: Power Query can struggle with very large datasets; consider reducing data early with filters or move to a tool like Python or Azure Data Factory if scalability becomes a concern.

Please refer official document and community threads.

Handling paging for Power Query connectors - Power Query | Microsoft Learn

Solved: PowerQuery Rest API Pagination - Microsoft Fabric Community

Solved: Pagination of a REST API in Power Query using M - Microsoft Fabric Community

If my response has resolved your query, please mark it as the Accepted Solution to assist others. Additionally, a 'Kudos' would be appreciated if you found my response helpful.

Thank you

Justin326117 · ‎04-29-2025

Thanks,

So, you have actually helped me as I have found OpenFDA has a limit of 26,000 hits.

The documentation says to use the search_after parameter instead in order to paginate the documentation.

However, I am unable to actually find the http url I am meant to be extracting, so not sure how I am meant to do this.

In regards to what I am currently working with as I have no idea how to use M code I've just had to rely on copilot

let
    // Your API key
    apiKey = "API KEY GOES HERE",
    DefaultRequestHeaders = [
        #"Authorization" = "Bearer " & apiKey
    ],

    // Function to get each page with skip logic
    GetPage = (skip as number) =>
        let
            response = Web.Contents(
                "https://api.fda.gov/device/event.json?search=date_received:[20250101+TO+20250201]&skip=" & Text.From(skip) & "&limit=1000",
                [
                    Headers = DefaultRequestHeaders
                ]
            ),
            body = Json.Document(response),
            data = Table.FromRecords(body[results])
        in
            data,

    // Generate all pages, stop if rows are fewer than 1000
    AllPages = List.Generate(
        () => [page = GetPage(0), skip = 0],
        each Table.RowCount([page]) > 0,
        each [page = GetPage([skip] + 1000), skip = [skip] + 1000],
        each [page]
    ),
    // Combine all pages into a single table
    #"Converted to Table" = Table.FromList(AllPages, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandTableColumn(#"Converted to Table", "Column1", {"manufacturer_contact_zip_ext", "manufacturer_g1_address_2", "event_location", "report_to_fda", "manufacturer_contact_t_name", "manufacturer_contact_state", "manufacturer_link_flag", "manufacturer_contact_address_2", "manufacturer_g1_city", "manufacturer_contact_address_1", "manufacturer_contact_pcity", "event_type", "report_number", "type_of_report", "product_problem_flag", "date_received", "manufacturer_address_2", "pma_pmn_number", "date_of_event", "reprocessed_and_reused_flag", "manufacturer_address_1", "exemption_number", "manufacturer_contact_zip_code", "reporter_occupation_code", "manufacturer_contact_plocal", "noe_summarized", "manufacturer_contact_l_name", "source_type", "distributor_zip_code_ext", "manufacturer_g1_postal_code", "manufacturer_g1_state", "reporter_country_code", "manufacturer_contact_area_code", "date_added", "manufacturer_contact_f_name", "device_date_of_manufacturer", "previous_use_code", "device", "product_problems", "manufacturer_zip_code", "suppl_dates_mfr_received", "manufacturer_contact_country", "date_changed", "health_professional", "summary_report_flag", "manufacturer_g1_zip_code_ext", "manufacturer_contact_extension", "manufacturer_city", "manufacturer_contact_phone_number", "patient"}, {"manufacturer_contact_zip_ext", "manufacturer_g1_address_2", "event_location", "report_to_fda", "manufacturer_contact_t_name", "manufacturer_contact_state", "manufacturer_link_flag", "manufacturer_contact_address_2", "manufacturer_g1_city", "manufacturer_contact_address_1", "manufacturer_contact_pcity", "event_type", "report_number", "type_of_report", "product_problem_flag", "date_received", "manufacturer_address_2", "pma_pmn_number", "date_of_event", "reprocessed_and_reused_flag", "manufacturer_address_1", "exemption_number", "manufacturer_contact_zip_code", "reporter_occupation_code", "manufacturer_contact_plocal", "noe_summarized", "manufacturer_contact_l_name", "source_type", "distributor_zip_code_ext", "manufacturer_g1_postal_code", "manufacturer_g1_state", "reporter_country_code", "manufacturer_contact_area_code", "date_added", "manufacturer_contact_f_name", "device_date_of_manufacturer", "previous_use_code", "device", "product_problems", "manufacturer_zip_code", "suppl_dates_mfr_received", "manufacturer_contact_country", "date_changed", "health_professional", "summary_report_flag", "manufacturer_g1_zip_code_ext", "manufacturer_contact_extension", "manufacturer_city", "manufacturer_contact_phone_number", "patient"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Column1", each ([device_date_of_manufacturer] = "20171127"))
in
    #"Filtered Rows"

However this has errors from Expression.Error where it says certain records were not found.

Not sure how this occurs, as when calling the same API on the web, those rows do have data in them and are not all blank. So, I am not too sure if a specific column is critical and when its left null it affects everything else.

If this could be fixed it would be of great help

MarkLaf · ‎04-29-2025

As you note, https://open.fda.gov/apis/paging/ documents that you can't skip more than 25k, and instead have to use the alternate search-after parameter to page higher. The doc notes that, to get the correct search-after parameter that gets you the next page of data, you have to use the "Link HTTP header contained in the response."

As far as I can tell, there is no way to expose the response Link header without creating a custom connector, as noted here: https://community.fabric.microsoft.com/t5/Desktop/Retrieve-API-response-headers/m-p/641107/highlight...

Fortunately, it's not too difficult to put together a custom connector (especially when you don't need to worry about authentication). I was able to create one with Visual Studio Code following the instructions at: https://learn.microsoft.com/en-us/power-query/install-sdk and https://learn.microsoft.com/en-us/power-query/creating-first-connector

There are a bunch of ways you could implement it, but I used the following for the main Contents function of the main pq definition file (I went with custom connector name, OpenFDA_DeepPaging):

shared OpenFDA_DeepPaging.Contents =
    let
        func = (baseURL as text, optional relativePath as text) as record =>
            let
                Source = Web.Contents(baseURL, [RelativePath = relativePath]),
                Headers = Value.Metadata(Source)[Headers],
                Results = Table.FromRecords(Json.Document(Source)[results], null, MissingField.UseNull),
                Return = [Headers = Headers, Results = Results]
            in
                Return
    in
        Value.ReplaceType(
            func, type function (baseURL as text, optional relativePath as text) as [
                Headers = record, Results = table
            ]
        );

This parses the main data returned (in Results step), exposes all response headers via Value.Metadata (Headers step), and then outputs both in a record (Return step).

Again, basically the only reason we need to do this in a custom connector rather than in PBI desktop is that we can access all the response headers with Value.Metadata; in PBI desktop, the same exact code (as well as Web.Headers) only returns a subset, which does not include Link.

I didn't really worry about api key handling as, per my understanding of the authentication doc, you can just include it as your first query parameter. No need to pass as a request header:

Using the above custom connector, the following M successfully pages through all applicable records given the limit provided.

// Test with: https://api.fda.gov/device/510k.json?search=openfda.regulation_number:868.5895&limit=100
// As of 2025-04-29, has 590 total records. With limit=100, we should page through 6 web calls
let
    SourceUrl = "https://api.fda.gov",
    GetAll = List.Generate(
        () =>
            OpenFDA_DeepPaging.Contents(
                SourceUrl, 
                "device/510k.json?search=openfda.regulation_number:868.5895&limit=100"
            ),
        each _ <> null,
        each
            if [Headers][Link]? = null then
                null
            else
                let
                    nextFullCall = Text.BetweenDelimiters([Headers][Link], "<", ">; rel"),
                    endpointAndQuery = Text.AfterDelimiter(nextFullCall, SourceUrl & "/")
                in
                    OpenFDA_DeepPaging.Contents(SourceUrl, endpointAndQuery),
        each [Results]
    ),
    CombineAll = Table.Combine(GetAll)
in
    CombineAll

SDK extension project is here if interested: https://github.com/MarkRasp/OpenFDA_DeepPaging

v-dineshya · ‎05-04-2025

Hi @Justin326117 ,

If you find @MarkLaf post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thanks and Regards

v-dineshya · ‎05-08-2025

Hi @Justin326117 ,

If you find @MarkLaf post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thanks and Regards

v-dineshya · ‎05-11-2025

Hi @Justin326117 ,

If you find @MarkLaf post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thanks and Regards

SacheeTh · ‎04-28-2025

Hi @Justin326117 ,

I had a similar issue, and here's what I did to get past the 1000-row limit using the skip and limit technique:

my case was Sharepoint List (5000 row limit)

I started with the first and last ID in my dataset. (this is just to skip the deleted values)
I created a list where I incrementally skipped rows in batches of 5000 (you can adjust this number based on your limit). (I called this Skip List - list for each skip)

3. I then created a custom function that took each value from the list and passed it as the "Skip" value from the [Skip] column and pass to the API call.

Here's the M code I used:

Custom1 = Table.AddColumn(#"PreviousStep", "Items", each Json.Document(Web.Contents(baseurl& "/items?$skiptoken=Paged=TRUE%26p_ID=" & [Skip] & fieldselect, [Headers=[Accept="application/json"]]))),

This allowed me to automatically pull data in multiple batches, so you don’t have to worry about the 1000-row limit. You just need to adjust the batch size and pagination logic accordingly.

- each of these Record has 5000 of data, next step to expand to values

please check the full M code below.

let
    sitename ="bal/Group_Operations/Industrial_Eng/Skill/App", 
    listname = "Skill DB",
    baseurl = "https://brandix.sharepoint.com/sites/" & sitename & "/_api/web/lists/GetByTitle('" & listname & "')/",
    //itemcount = Json.Document(Web.Contents(baseurl&"ItemCount", [Headers=[Accept="application/json"]]))[value],
     lastId = Json.Document(Web.Contents(baseurl&"items?$top=1&$orderby=ID desc", [Headers=[Accept="application/json"]]))[value]{0}[ID],
     FirstId = Json.Document(Web.Contents(baseurl&"items?$top=1&$orderby=ID asc", [Headers=[Accept="application/json"]]))[value]{0}[ID],
    skiplist = List.Numbers(FirstId-1, Number.RoundUp((lastId-FirstId)/5000), 5000),
    #"Converted to Table" = Table.FromList(skiplist, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Skip"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Skip", type text}}),
    fieldselect = "&$top=5000", // all fields with no expansion
    //fieldselect = "&$top=5000&$select = Id,Title,Person,Date", // list desired fields (no expansion)
    //fieldselect = "&$top=5000&$select=Id,Title,Choice,LookupColumn/Title,LookupColumn/Project,LookupColumn/ProjectStatus,Date,Person/LastName,Person/FirstName,Person/EMail&$expand=LookupColumn,Person",
    Custom1 = Table.AddColumn(#"Changed Type", "Items", each Json.Document(Web.Contents(baseurl& "/items?$skiptoken=Paged=TRUE%26p_ID=" &[Skip] & fieldselect, [Headers=[Accept="application/json"]]))),
    #"Expanded Items" = Table.ExpandRecordColumn(Custom1, "Items", {"value"}, {"value"}),
    #"Expanded value" = Table.ExpandListColumn(#"Expanded Items", "value"),
    #"Expanded value1" = Table.ExpandRecordColumn(#"Expanded value", "value", {"Average_x0020_CT", "Cluster", "Created", "EPF", "ID", "Machine", "Operation", "Plant", "Skill_x0020_factor", "SMV", "Style", "Team", "Title"}, {"Average_x0020_CT", "Cluster", "Created", "EPF", "ID", "Machine", "Operation", "Plant", "Skill_x0020_factor", "SMV", "Style", "Team", "Title"}),
    #"Removed Duplicates" = Table.Distinct(#"Expanded value1", {"ID"}),
    #"Inserted Merged Column" = Table.AddColumn(#"Removed Duplicates", "Plant+EPF", each Text.Combine({[Plant], [EPF]}, ":"), type text)
in
    #"Inserted Merged Column"

This may have the same method,
https://cmdrkeene.com/a-faster-way-to-get-data-from-sharepoint-lists-in-power-query/