<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: API Data ingestion Orchestration in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995737#M14960</link>
    <description>&lt;P&gt;For the 1st issue you might be better off using a comination of PowerShell and the fabric CLI, use PowerShell for your login and then run the fab api commands.&lt;/P&gt;</description>
    <pubDate>Wed, 04 Feb 2026 21:47:09 GMT</pubDate>
    <dc:creator>KevinChant</dc:creator>
    <dc:date>2026-02-04T21:47:09Z</dc:date>
    <item>
      <title>API Data ingestion Orchestration</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995127#M14954</link>
      <description>&lt;P&gt;Hi everyone,&lt;BR /&gt;&lt;BR /&gt;I'm working on a new project in Fabric where we have to use APIs for the data ingestion.&lt;BR /&gt;I'm new in APIs so I would really appreciate your suggestions and feedback on the topics that trouble me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1st issue: I have two types of APIs. The first one is a straight forward call eg. /api/emails and the second one is a nested one eg. /api/contacts/{contact_id}/activity. I have trouble creating a dynamic way to reach "activity" for each contact. My guesses are that somehow I'll have to gather all the contact_ids and then in a for each loop target the activity but I would appreciate more input on that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2nd issue: I'm trying to create a medallion architecture so the goal is to first bring the data in a landing area and then move them to bronze, silver, gold etc. So far I've been working with data of tabular format so creating a table in the landing area with a specific schema (even enabling schema evolution) is clear to me. However these apis responses are in json format and in most cases the json schema is not even the same.&lt;BR /&gt;For instance in one case I might have&lt;/P&gt;&lt;PRE&gt;{&lt;BR /&gt;"activity_id": "X"&lt;BR /&gt;"filters": {
          "region": [
            "Sydney"
          ],
          "activityType": [
            "TeamBonding"
          ]
        }&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;And in another one&lt;/P&gt;&lt;PRE&gt;{&lt;BR /&gt;"activity_id": "Z"&lt;BR /&gt;"filters": {
          "activityDuration": [
            "1day"
          ]
        }&lt;/PRE&gt;&lt;P&gt;If I want to end up on a report that shows "Which are the most used filters when selecting an activity", I'll need all the "filters" headers and their values, but how will I flatten the json and put it in a structured tabular schema directly via data factory copy activity?&lt;BR /&gt;And this also raises a new question. Maybe I should not try to put the data in a table directly yet keep them in a json file format in OneLake and try to flatten them in a notebook, however the different stucture still troubles me regarding the end result (tabular form). Which is the best practise to use here, both for the ingestion and the flattening?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3rd issue: I want to proceed with incremental loads so I plan on creating a "control table" with all the api calls I want to make and a LastIngestion watermark column (datetime). This way in every new run I can ingest data where DateAdded(api column) &amp;gt; LastIngestion and &amp;lt;= PipelineTriggerDateTime and then update the LastIngestion column of my control table with the&amp;nbsp;PipelineTriggerDateTime value. I think this is clear to me. However the issue is that I have to take under consideration pagination. The apis use parameters like start and limit but I'm not sure how I can also add this in the equation. Are they necessary even thought I'm going to use the date filtering? And if yes how am I going to incorporate them dynamically so that all the pages in the interval I've requested are read?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking forward for your feedback!&lt;BR /&gt;&lt;BR /&gt;Thank you in advance,&lt;BR /&gt;Rafaela&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2026 12:10:43 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995127#M14954</guid>
      <dc:creator>Rafaela07</dc:creator>
      <dc:date>2026-02-04T12:10:43Z</dc:date>
    </item>
    <item>
      <title>Re: API Data ingestion Orchestration</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995737#M14960</link>
      <description>&lt;P&gt;For the 1st issue you might be better off using a comination of PowerShell and the fabric CLI, use PowerShell for your login and then run the fab api commands.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2026 21:47:09 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995737#M14960</guid>
      <dc:creator>KevinChant</dc:creator>
      <dc:date>2026-02-04T21:47:09Z</dc:date>
    </item>
    <item>
      <title>Re: API Data ingestion Orchestration</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995766#M14961</link>
      <description>&lt;P&gt;we have something similar for several of our pipelines, an api call can simply be done using a copy activity and a webconnector.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Step one is to get the contact in a table , so you can query them, that will be your 1st API call.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;AKA: Read the contact from the API and ingest in , in our case a lakehouse table. If the contact are reasoanbly static you can choose to run that step seperately, or keep it as part of your pipleine&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The next step is to use a lookup activity which you can feed into a loop. In our case we have 2 separate ID's we need to complete the request ID, both are in the same table&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;THe lookup is followed by a for each, in which we fill the data from our lookup as variable&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="smeetsh_0-1770242145155.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1324682i0BFCED061B121BF9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="smeetsh_0-1770242145155.png" alt="smeetsh_0-1770242145155.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Next we use the, in our case two variables, to create a third variable that is a concatenate of the base url and the viariables, to build the complete&amp;nbsp; url for the API request and that url is used to access the API in the copy data activity&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="smeetsh_1-1770242261167.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1324683i490016F6A87A5787/image-size/medium?v=v2&amp;amp;px=400" role="button" title="smeetsh_1-1770242261167.png" alt="smeetsh_1-1770242261167.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In lakehouse you have two options: Write the json response to a file and use a copy activity outside of your loop ti ingest the files into your bronze table. Fabric will handle this itself , all you have to do is create a mapping in the tab of that copy activity. I would prefer that so I have a visual of the json resonse, which can be handy for trouble shooting.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The second option is to direct ingest into a lakehouse table.&lt;/P&gt;&lt;P&gt;NOTE the fact the json response can vary per call is a real pain in the XXX .&amp;nbsp; You will need to find out what the filter array can have as "columns" in the eaxmle i see the filter array can have region, activitytype and activity duration.&lt;/P&gt;&lt;P&gt;I would contact the supplier of the API for detail information with regards to the contruct of that json response. You may end up having to manually create a table, using sparkSQL, and a mapping to get the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Once you have it in the table, you have full control, as it is just another table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;last but not least, pagination, not my freind either lol, it depends on the api , but roughly you will have to add a pagination rule to your copy activity. the value of the BodyURL depends on the actual api response, in our case the next link for pagination is&amp;nbsp; odata.nextlink, but that can vary per API. Use a tool like postman to get a response, and you will see what your link will be, or look at the API documentation. Some API's are well documented, sadly a lot of supplier only document minimally&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="smeetsh_2-1770242626507.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1324684iBEABE8B53B12B312/image-size/medium?v=v2&amp;amp;px=400" role="button" title="smeetsh_2-1770242626507.png" alt="smeetsh_2-1770242626507.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2026 22:12:57 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/API-Data-ingestion-Orchestration/m-p/4995766#M14961</guid>
      <dc:creator>smeetsh</dc:creator>
      <dc:date>2026-02-04T22:12:57Z</dc:date>
    </item>
  </channel>
</rss>

