Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Hi,
I am trying to use an API to pull in data into powerquery, I won't go into details on other aspects. The issue is that I have to pull in more than 1000 rows, and powerquery seems to give me error messages when I go past a limit of 1000?
I have tried using pagination techniques, and it oddly sends an error message on row 26,000?
Does anyone have any advice on what I should do here? If I import the data will more than 1000 rows be sent through?
Or would I need multiple data pulls, where for each one I would have to skip 1000, than 2000
For transparency, I am looking to have this process automated, so would not want to have to manage this manually, I do not think I get a url in my API calls which can be used to help pagination, so I have to use the skip and limit technique
Solved! Go to Solution.
As you note, https://open.fda.gov/apis/paging/ documents that you can't skip more than 25k, and instead have to use the alternate search-after parameter to page higher. The doc notes that, to get the correct search-after parameter that gets you the next page of data, you have to use the "Link HTTP header contained in the response."
As far as I can tell, there is no way to expose the response Link header without creating a custom connector, as noted here: https://community.fabric.microsoft.com/t5/Desktop/Retrieve-API-response-headers/m-p/641107/highlight...
Fortunately, it's not too difficult to put together a custom connector (especially when you don't need to worry about authentication). I was able to create one with Visual Studio Code following the instructions at: https://learn.microsoft.com/en-us/power-query/install-sdk and https://learn.microsoft.com/en-us/power-query/creating-first-connector
There are a bunch of ways you could implement it, but I used the following for the main Contents function of the main pq definition file (I went with custom connector name, OpenFDA_DeepPaging):
shared OpenFDA_DeepPaging.Contents =
let
func = (baseURL as text, optional relativePath as text) as record =>
let
Source = Web.Contents(baseURL, [RelativePath = relativePath]),
Headers = Value.Metadata(Source)[Headers],
Results = Table.FromRecords(Json.Document(Source)[results], null, MissingField.UseNull),
Return = [Headers = Headers, Results = Results]
in
Return
in
Value.ReplaceType(
func, type function (baseURL as text, optional relativePath as text) as [
Headers = record, Results = table
]
);
This parses the main data returned (in Results step), exposes all response headers via Value.Metadata (Headers step), and then outputs both in a record (Return step).
Again, basically the only reason we need to do this in a custom connector rather than in PBI desktop is that we can access all the response headers with Value.Metadata; in PBI desktop, the same exact code (as well as Web.Headers) only returns a subset, which does not include Link.
I didn't really worry about api key handling as, per my understanding of the authentication doc, you can just include it as your first query parameter. No need to pass as a request header:
Using the above custom connector, the following M successfully pages through all applicable records given the limit provided.
// Test with: https://api.fda.gov/device/510k.json?search=openfda.regulation_number:868.5895&limit=100
// As of 2025-04-29, has 590 total records. With limit=100, we should page through 6 web calls
let
SourceUrl = "https://api.fda.gov",
GetAll = List.Generate(
() =>
OpenFDA_DeepPaging.Contents(
SourceUrl,
"device/510k.json?search=openfda.regulation_number:868.5895&limit=100"
),
each _ <> null,
each
if [Headers][Link]? = null then
null
else
let
nextFullCall = Text.BetweenDelimiters([Headers][Link], "<", ">; rel"),
endpointAndQuery = Text.AfterDelimiter(nextFullCall, SourceUrl & "/")
in
OpenFDA_DeepPaging.Contents(SourceUrl, endpointAndQuery),
each [Results]
),
CombineAll = Table.Combine(GetAll)
in
CombineAll
SDK extension project is here if interested: https://github.com/MarkRasp/OpenFDA_DeepPaging
Hi @Justin326117 ,
Thank you for reaching out to the Microsoft Community Forum.
Please follow below steps:
1. Use a paginated loop in Power Query (M): You can automate the entire data retrieval process in Power Query using a custom function that pages through the API using skip and limit, then combines the results into a single table.
Paginated API call using skip and limit in Power Query
M code:
let
PageSize = 1000,
GetPage = (PageNumber as number) =>
let
Skip = PageNumber * PageSize,
Source = Json.Document(Web.Contents("https://api.example.com/data",
[
Query = [
limit = Text.From(PageSize),
skip = Text.From(Skip)
]
])),
Data = Source[data] -- Adjust if your API structure differs
in
Data,
GetAllPages = List.Generate(
() => [Page = 0, Data = GetPage(0)],
each List.Count([Data]) > 0,
each [Page = [Page] + 1, Data = GetPage([Page] + 1)],
each [Data]
),
Flattened = List.Combine(GetAllPages),
TableOut = Table.FromList(Flattened, Record.FieldValues, {"ColumnName1", "ColumnName2"}) -- Adjust as needed
in
TableOut
Please check below things:
1. API limits: Check if the API has a hard limit on how far skip can go some APIs don't handle high offsets well (like 26,000+), and instead prefer cursor-based pagination.
2. Error handling: Add try...otherwise inside GetPage to catch and skip over any problematic page loads.
3. Performance: Power Query can struggle with very large datasets; consider reducing data early with filters or move to a tool like Python or Azure Data Factory if scalability becomes a concern.
Please refer official document and community threads.
Handling paging for Power Query connectors - Power Query | Microsoft Learn
Solved: PowerQuery Rest API Pagination - Microsoft Fabric Community
Solved: Pagination of a REST API in Power Query using M - Microsoft Fabric Community
If my response has resolved your query, please mark it as the Accepted Solution to assist others. Additionally, a 'Kudos' would be appreciated if you found my response helpful.
Thank you
Thanks,
So, you have actually helped me as I have found OpenFDA has a limit of 26,000 hits.
The documentation says to use the search_after parameter instead in order to paginate the documentation.
However, I am unable to actually find the http url I am meant to be extracting, so not sure how I am meant to do this.
In regards to what I am currently working with as I have no idea how to use M code I've just had to rely on copilot
let
// Your API key
apiKey = "API KEY GOES HERE",
DefaultRequestHeaders = [
#"Authorization" = "Bearer " & apiKey
],
// Function to get each page with skip logic
GetPage = (skip as number) =>
let
response = Web.Contents(
"https://api.fda.gov/device/event.json?search=date_received:[20250101+TO+20250201]&skip=" & Text.From(skip) & "&limit=1000",
[
Headers = DefaultRequestHeaders
]
),
body = Json.Document(response),
data = Table.FromRecords(body[results])
in
data,
// Generate all pages, stop if rows are fewer than 1000
AllPages = List.Generate(
() => [page = GetPage(0), skip = 0],
each Table.RowCount([page]) > 0,
each [page = GetPage([skip] + 1000), skip = [skip] + 1000],
each [page]
),
// Combine all pages into a single table
#"Converted to Table" = Table.FromList(AllPages, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandTableColumn(#"Converted to Table", "Column1", {"manufacturer_contact_zip_ext", "manufacturer_g1_address_2", "event_location", "report_to_fda", "manufacturer_contact_t_name", "manufacturer_contact_state", "manufacturer_link_flag", "manufacturer_contact_address_2", "manufacturer_g1_city", "manufacturer_contact_address_1", "manufacturer_contact_pcity", "event_type", "report_number", "type_of_report", "product_problem_flag", "date_received", "manufacturer_address_2", "pma_pmn_number", "date_of_event", "reprocessed_and_reused_flag", "manufacturer_address_1", "exemption_number", "manufacturer_contact_zip_code", "reporter_occupation_code", "manufacturer_contact_plocal", "noe_summarized", "manufacturer_contact_l_name", "source_type", "distributor_zip_code_ext", "manufacturer_g1_postal_code", "manufacturer_g1_state", "reporter_country_code", "manufacturer_contact_area_code", "date_added", "manufacturer_contact_f_name", "device_date_of_manufacturer", "previous_use_code", "device", "product_problems", "manufacturer_zip_code", "suppl_dates_mfr_received", "manufacturer_contact_country", "date_changed", "health_professional", "summary_report_flag", "manufacturer_g1_zip_code_ext", "manufacturer_contact_extension", "manufacturer_city", "manufacturer_contact_phone_number", "patient"}, {"manufacturer_contact_zip_ext", "manufacturer_g1_address_2", "event_location", "report_to_fda", "manufacturer_contact_t_name", "manufacturer_contact_state", "manufacturer_link_flag", "manufacturer_contact_address_2", "manufacturer_g1_city", "manufacturer_contact_address_1", "manufacturer_contact_pcity", "event_type", "report_number", "type_of_report", "product_problem_flag", "date_received", "manufacturer_address_2", "pma_pmn_number", "date_of_event", "reprocessed_and_reused_flag", "manufacturer_address_1", "exemption_number", "manufacturer_contact_zip_code", "reporter_occupation_code", "manufacturer_contact_plocal", "noe_summarized", "manufacturer_contact_l_name", "source_type", "distributor_zip_code_ext", "manufacturer_g1_postal_code", "manufacturer_g1_state", "reporter_country_code", "manufacturer_contact_area_code", "date_added", "manufacturer_contact_f_name", "device_date_of_manufacturer", "previous_use_code", "device", "product_problems", "manufacturer_zip_code", "suppl_dates_mfr_received", "manufacturer_contact_country", "date_changed", "health_professional", "summary_report_flag", "manufacturer_g1_zip_code_ext", "manufacturer_contact_extension", "manufacturer_city", "manufacturer_contact_phone_number", "patient"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Column1", each ([device_date_of_manufacturer] = "20171127"))
in
#"Filtered Rows"
However this has errors from Expression.Error where it says certain records were not found.
Not sure how this occurs, as when calling the same API on the web, those rows do have data in them and are not all blank. So, I am not too sure if a specific column is critical and when its left null it affects everything else.
If this could be fixed it would be of great help
As you note, https://open.fda.gov/apis/paging/ documents that you can't skip more than 25k, and instead have to use the alternate search-after parameter to page higher. The doc notes that, to get the correct search-after parameter that gets you the next page of data, you have to use the "Link HTTP header contained in the response."
As far as I can tell, there is no way to expose the response Link header without creating a custom connector, as noted here: https://community.fabric.microsoft.com/t5/Desktop/Retrieve-API-response-headers/m-p/641107/highlight...
Fortunately, it's not too difficult to put together a custom connector (especially when you don't need to worry about authentication). I was able to create one with Visual Studio Code following the instructions at: https://learn.microsoft.com/en-us/power-query/install-sdk and https://learn.microsoft.com/en-us/power-query/creating-first-connector
There are a bunch of ways you could implement it, but I used the following for the main Contents function of the main pq definition file (I went with custom connector name, OpenFDA_DeepPaging):
shared OpenFDA_DeepPaging.Contents =
let
func = (baseURL as text, optional relativePath as text) as record =>
let
Source = Web.Contents(baseURL, [RelativePath = relativePath]),
Headers = Value.Metadata(Source)[Headers],
Results = Table.FromRecords(Json.Document(Source)[results], null, MissingField.UseNull),
Return = [Headers = Headers, Results = Results]
in
Return
in
Value.ReplaceType(
func, type function (baseURL as text, optional relativePath as text) as [
Headers = record, Results = table
]
);
This parses the main data returned (in Results step), exposes all response headers via Value.Metadata (Headers step), and then outputs both in a record (Return step).
Again, basically the only reason we need to do this in a custom connector rather than in PBI desktop is that we can access all the response headers with Value.Metadata; in PBI desktop, the same exact code (as well as Web.Headers) only returns a subset, which does not include Link.
I didn't really worry about api key handling as, per my understanding of the authentication doc, you can just include it as your first query parameter. No need to pass as a request header:
Using the above custom connector, the following M successfully pages through all applicable records given the limit provided.
// Test with: https://api.fda.gov/device/510k.json?search=openfda.regulation_number:868.5895&limit=100
// As of 2025-04-29, has 590 total records. With limit=100, we should page through 6 web calls
let
SourceUrl = "https://api.fda.gov",
GetAll = List.Generate(
() =>
OpenFDA_DeepPaging.Contents(
SourceUrl,
"device/510k.json?search=openfda.regulation_number:868.5895&limit=100"
),
each _ <> null,
each
if [Headers][Link]? = null then
null
else
let
nextFullCall = Text.BetweenDelimiters([Headers][Link], "<", ">; rel"),
endpointAndQuery = Text.AfterDelimiter(nextFullCall, SourceUrl & "/")
in
OpenFDA_DeepPaging.Contents(SourceUrl, endpointAndQuery),
each [Results]
),
CombineAll = Table.Combine(GetAll)
in
CombineAll
SDK extension project is here if interested: https://github.com/MarkRasp/OpenFDA_DeepPaging
Hi @Justin326117 ,
If you find @MarkLaf post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thanks and Regards
Hi @Justin326117 ,
If you find @MarkLaf post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thanks and Regards
Hi @Justin326117 ,
If you find @MarkLaf post helpful, please mark it as an "Accept as Solution" and consider giving a KUDOS. Feel free to reach out if you need further assistance.
Thanks and Regards
Hi @Justin326117 ,
I had a similar issue, and here's what I did to get past the 1000-row limit using the skip and limit technique:
my case was Sharepoint List (5000 row limit)
3. I then created a custom function that took each value from the list and passed it as the "Skip" value from the [Skip] column and pass to the API call.
Here's the M code I used:
Custom1 = Table.AddColumn(#"PreviousStep", "Items", each Json.Document(Web.Contents(baseurl& "/items?$skiptoken=Paged=TRUE%26p_ID=" & [Skip] & fieldselect, [Headers=[Accept="application/json"]]))),
This allowed me to automatically pull data in multiple batches, so you don’t have to worry about the 1000-row limit. You just need to adjust the batch size and pagination logic accordingly.
- each of these Record has 5000 of data, next step to expand to values
please check the full M code below.
let
sitename ="bal/Group_Operations/Industrial_Eng/Skill/App",
listname = "Skill DB",
baseurl = "https://brandix.sharepoint.com/sites/" & sitename & "/_api/web/lists/GetByTitle('" & listname & "')/",
//itemcount = Json.Document(Web.Contents(baseurl&"ItemCount", [Headers=[Accept="application/json"]]))[value],
lastId = Json.Document(Web.Contents(baseurl&"items?$top=1&$orderby=ID desc", [Headers=[Accept="application/json"]]))[value]{0}[ID],
FirstId = Json.Document(Web.Contents(baseurl&"items?$top=1&$orderby=ID asc", [Headers=[Accept="application/json"]]))[value]{0}[ID],
skiplist = List.Numbers(FirstId-1, Number.RoundUp((lastId-FirstId)/5000), 5000),
#"Converted to Table" = Table.FromList(skiplist, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Skip"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Skip", type text}}),
fieldselect = "&$top=5000", // all fields with no expansion
//fieldselect = "&$top=5000&$select = Id,Title,Person,Date", // list desired fields (no expansion)
//fieldselect = "&$top=5000&$select=Id,Title,Choice,LookupColumn/Title,LookupColumn/Project,LookupColumn/ProjectStatus,Date,Person/LastName,Person/FirstName,Person/EMail&$expand=LookupColumn,Person",
Custom1 = Table.AddColumn(#"Changed Type", "Items", each Json.Document(Web.Contents(baseurl& "/items?$skiptoken=Paged=TRUE%26p_ID=" &[Skip] & fieldselect, [Headers=[Accept="application/json"]]))),
#"Expanded Items" = Table.ExpandRecordColumn(Custom1, "Items", {"value"}, {"value"}),
#"Expanded value" = Table.ExpandListColumn(#"Expanded Items", "value"),
#"Expanded value1" = Table.ExpandRecordColumn(#"Expanded value", "value", {"Average_x0020_CT", "Cluster", "Created", "EPF", "ID", "Machine", "Operation", "Plant", "Skill_x0020_factor", "SMV", "Style", "Team", "Title"}, {"Average_x0020_CT", "Cluster", "Created", "EPF", "ID", "Machine", "Operation", "Plant", "Skill_x0020_factor", "SMV", "Style", "Team", "Title"}),
#"Removed Duplicates" = Table.Distinct(#"Expanded value1", {"ID"}),
#"Inserted Merged Column" = Table.AddColumn(#"Removed Duplicates", "Plant+EPF", each Text.Combine({[Plant], [EPF]}, ":"), type text)
in
#"Inserted Merged Column"
This may have the same method,
https://cmdrkeene.com/a-faster-way-to-get-data-from-sharepoint-lists-in-power-query/
you are not providing enough information for us to try and help. Does this API have a documentation?
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
15 | |
10 | |
10 | |
8 | |
7 |