Solved: Re: API Data ingestion Orchestration

Rafaela07 · ‎02-04-2026

Hi everyone,

I'm working on a new project in Fabric where we have to use APIs for the data ingestion.
I'm new in APIs so I would really appreciate your suggestions and feedback on the topics that trouble me.

1st issue: I have two types of APIs. The first one is a straight forward call eg. /api/emails and the second one is a nested one eg. /api/contacts/{contact_id}/activity. I have trouble creating a dynamic way to reach "activity" for each contact. My guesses are that somehow I'll have to gather all the contact_ids and then in a for each loop target the activity but I would appreciate more input on that.

2nd issue: I'm trying to create a medallion architecture so the goal is to first bring the data in a landing area and then move them to bronze, silver, gold etc. So far I've been working with data of tabular format so creating a table in the landing area with a specific schema (even enabling schema evolution) is clear to me. However these apis responses are in json format and in most cases the json schema is not even the same.
For instance in one case I might have

{
"activity_id": "X"
"filters": {
          "region": [
            "Sydney"
          ],
          "activityType": [
            "TeamBonding"
          ]
        }

And in another one

{
"activity_id": "Z"
"filters": {
          "activityDuration": [
            "1day"
          ]
        }

If I want to end up on a report that shows "Which are the most used filters when selecting an activity", I'll need all the "filters" headers and their values, but how will I flatten the json and put it in a structured tabular schema directly via data factory copy activity?
And this also raises a new question. Maybe I should not try to put the data in a table directly yet keep them in a json file format in OneLake and try to flatten them in a notebook, however the different stucture still troubles me regarding the end result (tabular form). Which is the best practise to use here, both for the ingestion and the flattening?

3rd issue: I want to proceed with incremental loads so I plan on creating a "control table" with all the api calls I want to make and a LastIngestion watermark column (datetime). This way in every new run I can ingest data where DateAdded(api column) > LastIngestion and <= PipelineTriggerDateTime and then update the LastIngestion column of my control table with the PipelineTriggerDateTime value. I think this is clear to me. However the issue is that I have to take under consideration pagination. The apis use parameters like start and limit but I'm not sure how I can also add this in the equation. Are they necessary even thought I'm going to use the date filtering? And if yes how am I going to incorporate them dynamically so that all the pages in the interval I've requested are read?

Looking forward for your feedback!

Thank you in advance,
Rafaela

smeetsh · ‎02-04-2026

we have something similar for several of our pipelines, an api call can simply be done using a copy activity and a webconnector.

Step one is to get the contact in a table , so you can query them, that will be your 1st API call.

AKA: Read the contact from the API and ingest in , in our case a lakehouse table. If the contact are reasoanbly static you can choose to run that step seperately, or keep it as part of your pipleine

The next step is to use a lookup activity which you can feed into a loop. In our case we have 2 separate ID's we need to complete the request ID, both are in the same table

THe lookup is followed by a for each, in which we fill the data from our lookup as variable

Next we use the, in our case two variables, to create a third variable that is a concatenate of the base url and the viariables, to build the complete url for the API request and that url is used to access the API in the copy data activity

In lakehouse you have two options: Write the json response to a file and use a copy activity outside of your loop ti ingest the files into your bronze table. Fabric will handle this itself , all you have to do is create a mapping in the tab of that copy activity. I would prefer that so I have a visual of the json resonse, which can be handy for trouble shooting.

The second option is to direct ingest into a lakehouse table.

NOTE the fact the json response can vary per call is a real pain in the XXX . You will need to find out what the filter array can have as "columns" in the eaxmle i see the filter array can have region, activitytype and activity duration.

I would contact the supplier of the API for detail information with regards to the contruct of that json response. You may end up having to manually create a table, using sparkSQL, and a mapping to get the data.

Once you have it in the table, you have full control, as it is just another table.

last but not least, pagination, not my freind either lol, it depends on the api , but roughly you will have to add a pagination rule to your copy activity. the value of the BodyURL depends on the actual api response, in our case the next link for pagination is odata.nextlink, but that can vary per API. Use a tool like postman to get a response, and you will see what your link will be, or look at the API documentation. Some API's are well documented, sadly a lot of supplier only document minimally

Cheers

Hans

(If my solution was usefull, please give it a kudo, and mark it as a solution)

View solution in original post

smeetsh · ‎02-04-2026