Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
max_mrc
Frequent Visitor

Full Load from API Skipping some files

I have created a pipeline in Fabric that does a full load from an API endpoint. The pipeline gets the data from the API endpoint and loads the files into a Lakehouse. I tried in the sandbox environment and it works perfect. However, when I run the pipeline in production, the capacity exceeded. Then I upgraded from F32 to F64. However, still the capacity exceeded. I let it run and the pipeline succeeded. However, I have got only 123000 records out of 210000 records. There is no pattern as well. It just skipped few files in between. Is it because bursting, smoothing that Fabric use? What is the best way to tackle this issue. Can I do something to make my pipeline take only 80% of the capacity, take longer time no problem but I want the full load to happen.

1 ACCEPTED SOLUTION
v-lgarikapat
Community Support
Community Support

Hi @max_mrc ,

Thanks for reaching out to the Microsoft fabric community forum.
Caues 

API Pagination/Throttling Behaviour

  • Some API calls may fail silently or be throttled, returning partial data.
  • If you're calling the API in parallel or in large batches, errors may not be retried correctly, and Fabric might skip the failed chunks.
  • Check if your pipeline has error-handling or retries on failed API calls.
  1. Fabric Capacity Bursting/Smoothing
  • Fabric tries to burst workloads over the set capacity using smoothing.
  • If your job exceeds the burst buffer too quickly or for too long, operations may get dropped or throttled — especially less critical ones, like background loads.
  • This could explain why some files/data are skipped without error.
  1. Concurrency / Parallelism
  • If your pipeline processes API calls in parallel, it can spike your capacity usage.
  • That spike may lead to task drops or silent failure of lower-priority tasks.

Best Practices & Fixes

  1. Limit Concurrency
  • In your copy activity or loop, set degree of parallelism to a lower number (e.g., 2–4).
  • Fabric defaults to high parallelism in some cases, which might spike capacity use.
  • 2. Throttle API Calls
  • Introduce a wait/delay (e.g., sleep 1s) between each API call or page fetch.
  • Use a custom Until loop or ForEach with delay to slow down execution and reduce load.
  1. 3. Implement Retries and Logging
  • Use robust retry logic in each API call (3+ retries with exponential backoff).
  • Log each API call result (success/failure), even in a separate table if needed, to detect any silent skips.
  1. 4. Partition Your Load
  • Break the full load into smaller, deterministic partitions (e.g., by date, region, or ID range).
  • This helps in tracking what's been loaded and allows for easy retry of failed partitions.
  1. 5. Monitor Capacity Consumption
  • Use Fabric Monitoring tools or Azure Monitor Metrics to watch CPU and memory usage in real time.
  • Set alerts for near-capacity thresholds to know when you’re close to limits.
  1. 6. Use Dataflows Gen2 or Data Pipeline Alternatives
  • For larger full loads, consider:
    • Dataflows Gen2 (if available) for chunked API ingestion.
    • Staging into Blob/ADLS first, then processing into Lakehouse.

 Optional: "Take Only 80% of Capacity"

There’s no direct way to say “only use 80% of capacity,” but you can

simulate that behaviour by:

  • Reducing parallelism.
  • Adding delay/sleep in loops.
  • Spreading load over more pipeline runs (e.g., time-partitioned loads).
  • Reducing dataset size per copy activity.

Evaluate and optimize your Microsoft Fabric capacity - Microsoft Fabric | Microsoft Learn

Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn
Plan your capacity size - Microsoft Fabric | Microsoft Learn

Smoothing and Throttling - Microsoft Fabric | Microsoft Learn

Scale your Fabric capacity - Microsoft Fabric | Microsoft Learn

If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.

We appreciate your engagement and thank you for being an active part of the community.

Best regards,
LakshmiNarayana

 

 

ana.

View solution in original post

4 REPLIES 4
v-lgarikapat
Community Support
Community Support

@gslick ,
Thanks for the follow-up question.
Yes — if your API integration involves complex logic or you're encountering throttling or capacity limits, moving to a Notebook (Python) is a recommended approach. It provides more control over pagination, retries, throttling, and error handling. You can also implement custom logging and batching logic more easily. This approach can either complement or replace pipelines, depending on your use case
Use Pagination with Fabric REST APIs - Microsoft Fabric REST APIs | Microsoft Learn

If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.

We appreciate your engagement and thank you for being an active part of the community.

Best regards,
LakshmiNarayana.



gslick
Frequent Visitor

Instead of using a pipeline, could you use a Notebook using Python instead to connect to the API?

v-lgarikapat
Community Support
Community Support

Hi @max_mrc ,

Thanks for reaching out to the Microsoft fabric community forum.
Caues 

API Pagination/Throttling Behaviour

  • Some API calls may fail silently or be throttled, returning partial data.
  • If you're calling the API in parallel or in large batches, errors may not be retried correctly, and Fabric might skip the failed chunks.
  • Check if your pipeline has error-handling or retries on failed API calls.
  1. Fabric Capacity Bursting/Smoothing
  • Fabric tries to burst workloads over the set capacity using smoothing.
  • If your job exceeds the burst buffer too quickly or for too long, operations may get dropped or throttled — especially less critical ones, like background loads.
  • This could explain why some files/data are skipped without error.
  1. Concurrency / Parallelism
  • If your pipeline processes API calls in parallel, it can spike your capacity usage.
  • That spike may lead to task drops or silent failure of lower-priority tasks.

Best Practices & Fixes

  1. Limit Concurrency
  • In your copy activity or loop, set degree of parallelism to a lower number (e.g., 2–4).
  • Fabric defaults to high parallelism in some cases, which might spike capacity use.
  • 2. Throttle API Calls
  • Introduce a wait/delay (e.g., sleep 1s) between each API call or page fetch.
  • Use a custom Until loop or ForEach with delay to slow down execution and reduce load.
  1. 3. Implement Retries and Logging
  • Use robust retry logic in each API call (3+ retries with exponential backoff).
  • Log each API call result (success/failure), even in a separate table if needed, to detect any silent skips.
  1. 4. Partition Your Load
  • Break the full load into smaller, deterministic partitions (e.g., by date, region, or ID range).
  • This helps in tracking what's been loaded and allows for easy retry of failed partitions.
  1. 5. Monitor Capacity Consumption
  • Use Fabric Monitoring tools or Azure Monitor Metrics to watch CPU and memory usage in real time.
  • Set alerts for near-capacity thresholds to know when you’re close to limits.
  1. 6. Use Dataflows Gen2 or Data Pipeline Alternatives
  • For larger full loads, consider:
    • Dataflows Gen2 (if available) for chunked API ingestion.
    • Staging into Blob/ADLS first, then processing into Lakehouse.

 Optional: "Take Only 80% of Capacity"

There’s no direct way to say “only use 80% of capacity,” but you can

simulate that behaviour by:

  • Reducing parallelism.
  • Adding delay/sleep in loops.
  • Spreading load over more pipeline runs (e.g., time-partitioned loads).
  • Reducing dataset size per copy activity.

Evaluate and optimize your Microsoft Fabric capacity - Microsoft Fabric | Microsoft Learn

Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn
Plan your capacity size - Microsoft Fabric | Microsoft Learn

Smoothing and Throttling - Microsoft Fabric | Microsoft Learn

Scale your Fabric capacity - Microsoft Fabric | Microsoft Learn

If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.

We appreciate your engagement and thank you for being an active part of the community.

Best regards,
LakshmiNarayana

 

 

ana.

Hi @max_mrc ,

 If your issue has been resolved, please consider marking the most helpful reply as the accepted solution. This helps other community members who may encounter the same issue to find answers more efficiently.

If you're still facing challenges, feel free to let us know—we’ll be glad to assist you further.

Looking forward to your response.

Best regards,
LakshmiNarayana.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.