<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Full Load from API Skipping some files in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4701551#M9467</link>
    <description>&lt;P&gt;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/916685"&gt;@gslick&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;Thanks for the follow-up question.&lt;BR /&gt;Yes — if your API integration involves complex logic or you're encountering throttling or capacity limits, moving to a Notebook (Python) is a recommended approach&lt;STRONG data-start="196" data-end="363"&gt;.&lt;/STRONG&gt; It provides more control over pagination, retries, throttling, and error handling. You can also implement custom logging and batching logic more easily. This approach can either complement or replace pipelines, depending on your use case&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/rest/api/fabric/articles/pagination" target="_blank" rel="noopener"&gt;Use Pagination with Fabric REST APIs - Microsoft Fabric REST APIs | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;If this post helped resolve your issue, please consider giving it &lt;STRONG&gt;Kudos&lt;/STRONG&gt; and marking it as the &lt;STRONG&gt;Accepted Solution&lt;/STRONG&gt;. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.&lt;/P&gt;
&lt;P&gt;We appreciate your engagement and thank you for being an active part of the community.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Best regards,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;LakshmiNarayana.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 21 May 2025 11:40:23 GMT</pubDate>
    <dc:creator>v-lgarikapat</dc:creator>
    <dc:date>2025-05-21T11:40:23Z</dc:date>
    <item>
      <title>Full Load from API Skipping some files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4699154#M9404</link>
      <description>&lt;P&gt;I have created a pipeline in Fabric that does a full load from an API endpoint. The pipeline gets the data from the API endpoint and loads the files into a Lakehouse. I tried in the sandbox environment and it works perfect. However, when I run the pipeline in production, the capacity exceeded. Then I upgraded from F32 to F64. However, still the capacity exceeded. I let it run and the pipeline succeeded. However, I have got only 123000 records out of 210000 records. There is no pattern as well. It just skipped few files in between. Is it because bursting, smoothing that Fabric use? What is the best way to tackle this issue. Can I do something to make my pipeline take only 80% of the capacity, take longer time no problem but I want the full load to happen.&lt;/P&gt;</description>
      <pubDate>Tue, 20 May 2025 02:51:36 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4699154#M9404</guid>
      <dc:creator>max_mrc</dc:creator>
      <dc:date>2025-05-20T02:51:36Z</dc:date>
    </item>
    <item>
      <title>Re: Full Load from API Skipping some files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4699971#M9426</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1237933"&gt;@max_mrc&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Thanks for reaching out to the Microsoft fabric community forum.&lt;BR /&gt;&lt;STRONG&gt;Caues&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;API Pagination/Throttling Behaviour&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Some API calls may fail silently&lt;/STRONG&gt; or be throttled, returning partial data.&lt;/LI&gt;
&lt;LI&gt;If you're calling the API in parallel or in large batches, &lt;STRONG&gt;errors may not be retried&lt;/STRONG&gt; correctly, and Fabric might &lt;STRONG&gt;skip the failed chunks&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;Check if your pipeline has &lt;STRONG&gt;error-handling or retries&lt;/STRONG&gt; on failed API calls.&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;STRONG&gt; Fabric Capacity Bursting/Smoothing&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;Fabric tries to &lt;STRONG&gt;burst&lt;/STRONG&gt; workloads over the set capacity using &lt;STRONG&gt;smoothing&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI&gt;If your job exceeds the burst buffer &lt;STRONG&gt;too quickly or for too long&lt;/STRONG&gt;, operations may get dropped or throttled — especially &lt;STRONG&gt;less critical ones&lt;/STRONG&gt;, like background loads.&lt;/LI&gt;
&lt;LI&gt;This could explain why &lt;STRONG&gt;some files/data are skipped without error&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt; Concurrency / Parallelism&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;If your pipeline processes API calls &lt;STRONG&gt;in parallel&lt;/STRONG&gt;, it can spike your capacity usage.&lt;/LI&gt;
&lt;LI&gt;That spike may lead to &lt;STRONG&gt;task drops or silent failure&lt;/STRONG&gt; of lower-priority tasks.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Best Practices &amp;amp; Fixes&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt; Limit Concurrency&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;In your copy activity or loop, &lt;STRONG&gt;set degree of parallelism to a lower number&lt;/STRONG&gt; (e.g., 2–4).&lt;/LI&gt;
&lt;LI&gt;Fabric defaults to high parallelism in some cases, which might spike capacity use.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;2. Throttle API Calls&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;Introduce a &lt;STRONG&gt;wait/delay&lt;/STRONG&gt; (e.g., sleep 1s) between each API call or page fetch.&lt;/LI&gt;
&lt;LI&gt;Use a custom &lt;STRONG&gt;Until loop&lt;/STRONG&gt; or &lt;STRONG&gt;ForEach with delay&lt;/STRONG&gt; to slow down execution and reduce load.&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt;3. Implement Retries and Logging&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;Use &lt;STRONG&gt;robust retry logic&lt;/STRONG&gt; in each API call (3+ retries with exponential backoff).&lt;/LI&gt;
&lt;LI&gt;Log &lt;STRONG&gt;each API call result&lt;/STRONG&gt; (success/failure), even in a separate table if needed, to detect any silent skips.&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="4"&gt;
&lt;LI&gt;&lt;STRONG&gt;4. Partition Your Load&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;Break the full load into &lt;STRONG&gt;smaller, deterministic partitions&lt;/STRONG&gt; (e.g., by date, region, or ID range).&lt;/LI&gt;
&lt;LI&gt;This helps in &lt;STRONG&gt;tracking what's been loaded&lt;/STRONG&gt; and allows for &lt;STRONG&gt;easy retry&lt;/STRONG&gt; of failed partitions.&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="5"&gt;
&lt;LI&gt;&lt;STRONG&gt;5. Monitor Capacity Consumption&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;Use &lt;STRONG&gt;Fabric Monitoring tools&lt;/STRONG&gt; or &lt;STRONG&gt;Azure Monitor Metrics&lt;/STRONG&gt; to watch CPU and memory usage in real time.&lt;/LI&gt;
&lt;LI&gt;Set alerts for &lt;STRONG&gt;near-capacity thresholds&lt;/STRONG&gt; to know when you’re close to limits.&lt;/LI&gt;
&lt;/UL&gt;
&lt;OL start="6"&gt;
&lt;LI&gt;&lt;STRONG&gt;6. Use Dataflows Gen2 or Data Pipeline Alternatives&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;UL&gt;
&lt;LI&gt;For larger full loads, consider:&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Dataflows Gen2&lt;/STRONG&gt; (if available) for chunked API ingestion.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Staging into Blob/ADLS&lt;/STRONG&gt; first, then processing into Lakehouse.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;Optional: "Take Only 80% of Capacity"&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;There’s no direct way to say “only use 80% of capacity,” but you can&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;simulate that behaviour&lt;/STRONG&gt; by:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Reducing parallelism.&lt;/LI&gt;
&lt;LI&gt;Adding delay/sleep in loops.&lt;/LI&gt;
&lt;LI&gt;Spreading load over more pipeline runs (e.g., time-partitioned loads).&lt;/LI&gt;
&lt;LI&gt;Reducing dataset size per copy activity.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/enterprise/optimize-capacity" target="_blank"&gt;Evaluate and optimize your Microsoft Fabric capacity - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/enterprise/throttling" target="_blank"&gt;Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/enterprise/plan-capacity" target="_blank"&gt;Plan your capacity size - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/data-warehouse/compute-capacity-smoothing-throttling" target="_blank"&gt;Smoothing and Throttling - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/enterprise/scale-capacity" target="_blank"&gt;Scale your Fabric capacity - Microsoft Fabric | Microsoft Learn&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;If this post helped resolve your issue, please consider giving it &lt;STRONG&gt;Kudos&lt;/STRONG&gt; and marking it as the &lt;STRONG&gt;Accepted Solution&lt;/STRONG&gt;. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.&lt;/P&gt;
&lt;P&gt;We appreciate your engagement and thank you for being an active part of the community.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Best regards,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;LakshmiNarayana&lt;/STRONG&gt;&lt;/P&gt;
&lt;P style="line-height: 1.71429;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="line-height: 1.71429;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;ana&lt;/EM&gt;&lt;EM&gt;.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 May 2025 11:17:32 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4699971#M9426</guid>
      <dc:creator>v-lgarikapat</dc:creator>
      <dc:date>2025-05-20T11:17:32Z</dc:date>
    </item>
    <item>
      <title>Re: Full Load from API Skipping some files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4700569#M9437</link>
      <description>&lt;P&gt;Instead of using a pipeline, could you use a Notebook using Python instead to connect to the API?&lt;/P&gt;</description>
      <pubDate>Tue, 20 May 2025 18:43:56 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4700569#M9437</guid>
      <dc:creator>gslick</dc:creator>
      <dc:date>2025-05-20T18:43:56Z</dc:date>
    </item>
    <item>
      <title>Re: Full Load from API Skipping some files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4701551#M9467</link>
      <description>&lt;P&gt;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/916685"&gt;@gslick&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;Thanks for the follow-up question.&lt;BR /&gt;Yes — if your API integration involves complex logic or you're encountering throttling or capacity limits, moving to a Notebook (Python) is a recommended approach&lt;STRONG data-start="196" data-end="363"&gt;.&lt;/STRONG&gt; It provides more control over pagination, retries, throttling, and error handling. You can also implement custom logging and batching logic more easily. This approach can either complement or replace pipelines, depending on your use case&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/rest/api/fabric/articles/pagination" target="_blank" rel="noopener"&gt;Use Pagination with Fabric REST APIs - Microsoft Fabric REST APIs | Microsoft Learn&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;If this post helped resolve your issue, please consider giving it &lt;STRONG&gt;Kudos&lt;/STRONG&gt; and marking it as the &lt;STRONG&gt;Accepted Solution&lt;/STRONG&gt;. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.&lt;/P&gt;
&lt;P&gt;We appreciate your engagement and thank you for being an active part of the community.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Best regards,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;LakshmiNarayana.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 May 2025 11:40:23 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4701551#M9467</guid>
      <dc:creator>v-lgarikapat</dc:creator>
      <dc:date>2025-05-21T11:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: Full Load from API Skipping some files</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4706739#M9584</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1237933"&gt;@max_mrc&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;If your issue has been resolved, please consider marking the most helpful reply as the &lt;STRONG&gt;accepted solution&lt;/STRONG&gt;. This helps other community members who may encounter the same issue to find answers more efficiently.&lt;/P&gt;
&lt;P&gt;If you're still facing challenges, feel free to let us know—we’ll be glad to assist you further.&lt;/P&gt;
&lt;P&gt;Looking forward to your response.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Best regards,&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;LakshmiNarayana.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 May 2025 06:08:29 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Full-Load-from-API-Skipping-some-files/m-p/4706739#M9584</guid>
      <dc:creator>v-lgarikapat</dc:creator>
      <dc:date>2025-05-26T06:08:29Z</dc:date>
    </item>
  </channel>
</rss>

