using a column max value to create an incremental ...

Devrij · ‎06-05-2024

Hi all,

I have a big table (1.5m rows) that I want to query through a dataflow, and I want to only refresh new rows. I have a column called "end" in the table that will give the row number of the last row called from the source db, and I want to effectively modify the dataflow web API query URL by adding skip=(max value of "end" column) so I am only querying new rows each time.

eg

Json.Document(Web.Contents("https://xxxxxxx?api_key=xxxxxxx&dbs=xxxxx&collection=xxxxxxx&limit=100000&skip="&List.Max(end)))

The reason i have to do it this way is the powerquery times out after 10mins and I can only pull circa 100k rows at a time before that happens, but I need to be able to change the API call to pull the correct 100k rows.

Any ideas on how best to do this?

Anonymous · ‎06-26-2024

Hi @Devrij ,
Thanks for @lbendlin reply.
Based on your description, you can that you need to determine the maximum value of the "end" column from the loaded data before building the API call. This can be done using Power Query's functionality, which you have correctly identified. Make sure you perform this step every time before making an API call to get the latest "end" value. The extracted "end" column maximum is then used to dynamically construct the API URL. your Power Query expression appears to be correct. However. make sure that the function correctly references the "end" column in the loaded data.
The end result for Cane looks like this

let
    MaxEndValue = List.Max(YourPreviousQueryName[end]),
    APIUrl = "https://xxxxxxx?api_key=xxxxxxx&dbs=xxxxx&collection=xxxxxxx&limit=100000&skip=" & Text.From(MaxEndValue),
    Source = Json.Document(Web.Contents(APIUrl))
in
    Source

Best regards,
Albert He

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly

lbendlin · ‎06-05-2024

Not directly possible. Power BI Incremental Refresh is exclusively based on datetime partition boundaries. You can create your own partitions with different boundary rules but that would require you to re-invent all the partition management processes.

Does your source table have a suitable date or datetime column that you can use instead?

If your data is immutable then you can also consider running your own extract process based on your described logic, and place the results in a sharepoint folder so Power BI can pick the extracts up from there.

1.5M rows is not big, and does not normally warrant incremental refresh. Your mentioned timeout is a valid reason, however.

Devrij · ‎06-06-2024

Thanks for the reply, I do have a datetime column, but what I'm struggling with is inserting that into the web API url to modify the actual query sent to the source. All the incremental query seems to do is filter the results of the source query, and since I can't query all of the rows in one go without it timing out, I'm not sure if that will work for us. Whereas it's a lot easier to just say "skip this many rows and pull the next 100k". As you say timeout is the only reason I need it to be incremental, although we are trying to be frugal with compute usage.

using a column max value to create an incremental refresh

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

using a column max value to create an incremental refresh

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025