Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
__maca__
Advocate I
Advocate I

Pipeline | REST API | High CU % Usage

Hi,

 

I have set up a simple process to query an external API and retrieve data using Copy Data Activity. The issue I'm facing is that the CU % Usage keeps increasing linearly and eventually reaches 100% (F64), even though the task seems simple and should not require such high usage. Here are the details of my setup:

 

  • I retrieve a list of 1,000 IDs from the REST API using a Copy Data Activity
  • I then use two ForEach blocks in parallel, each containing a Copy Data Activity, to query two separate endpoints with these IDs in batches of 15
  • The Intelligent throughput optimization is set to 4, and the Degree of copy parallelism is set to 1 (i.e., all settings to minimum)
  • Each file, between 100KB and 200KB in size, is saved as a JSON file (raw) in a Lakehouse / Files. Hence, the data is not processed in any way.
  • The whole process is contained within an Until activity and repeats itself until no more IDs are available (in total, probably about 20,000 IDs)

Despite the really simple nature of these operations, the CU % Usage continues to rise. I expected it to remain stable at a low percentage for the entire run. Is there an auto-scaling function that might be causing this? I believe this Data Pipeline should be able to run efficiently on an F64 capacity for days without issues since the same process can easily run on any laptop, or even a Raspberry Pi.

 

Any help would be greatly appreciated! 

Cheers!

6 REPLIES 6
v-cboorla-msft
Community Support
Community Support

Hi @__maca__ 

 

Thanks for using Microsoft Fabric Community.

The high and increasing CU % Usage in your Microsoft Fabric Data Pipeline, despite a seemingly simple process, could be due to several factors.

 

Potential Culprits for High CU Usage:

ForEach Loop Execution: Even with a Degree of parallelism set to 1, executing ForEach loops in parallel might trigger resource allocation for both simultaneously. Consider changing them to sequential execution.

Batch Processing Overhead: Retrieving data in 15-sized batches (67 iterations for 1000 IDs) can lead to context switching overhead. Experiment with larger batch sizes (e.g., 50 or 100) within the ForEach loops.

Accumulating Data Volume: While individual files are small, processing 20,000 IDs might accumulate data volume over time. Monitor memory usage and consider data compression if applicable.

Hidden Processing: Double-check Copy Data Activity settings for unintended transformations or validations that could increase resource consumption.

Metrics Analysis: Utilize Azure Monitor to track specific metrics like API call latency, data transfer, and memory usage. This can pinpoint the resource bottleneck.

 

Optimizing Your Data Pipeline:

Refine ForEach Loop Execution: Change the ForEach loop execution to sequential to potentially limit the initial resource allocation spike.

Adjust Batch Size: Experiment with larger batch sizes (50 or 100) within the ForEach loop to reduce iterations and context switching.

Explore Bulk Retrieval: If the external API allows, consider retrieving all 1000 IDs in a single call for a simpler logic and potentially less context switching.

Monitor Key Metrics: Use Azure Monitor to identify resource bottlenecks by tracking API call latency, data transfer, and memory usage.

Review Activity Configuration: Double-check Copy Data Activity settings for any unnecessary transformations or validations.

Leverage Compression: Depending on the data format, consider compressing the retrieved JSON data before saving it to the lakehouse to potentially reduce storage and CU usage.

 

By implementing these suggestions and analyzing the specific metrics, you should be able to pinpoint the root cause of the high CU usage and significantly improve your Data Pipeline's efficiency.

 

I hope this information helps. 

 

Thank you.

We're having a similar issue for similar cause, and I have to say your response doesn't make all that much sense to me.  Woulding increasing the batch size so something runs faster consume MORE compute?

It's not uncommon for APIs to have concurancy limites and record count limits to ensure their own resources aren't exceeded.  In OUR case it's NetSuite.  Try as we may, the most we can pull at any given time is 10 concurrent requests at 1,000 records each.

We've implemented a FOREACH within Fabric Data Factory that sources the data in 10 batches before iterating the next 10, so at any given time I would expect to only see CU usage for 10 batches.  When the ForEach iterates to the next loop it's a new set of 10 requests and the last request is complete.  I would expect burndown to handle closing the last exection so the CU shouldn't grow out of control.

That's not what happens.  It starts off low and just keeps climbing and climbing.  It's like no burndown is happening at all and the staging of the copy step is just holding on to resources.

DanielSDavis_1-1728133576921.png

DanielSDavis_0-1728133556965.png

Hi @__maca__ 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.

 

Thank you.

Hi @v-cboorla-msft,

Thanks for the detailed answer and sorry for the late reply. I’ve been busy trying out all the ideas you suggested. Unfortunately, I haven’t seen any improvement yet. The API we’re using limits us to 15 IDs at a time, which makes bulk retrieval somewhat impossible. I’ve set up my pipeline so that all the ForEach loops are triggered one after another and tried using higher batch sizes (the max allowed by Fabric is 50, so I couldn’t go beyond that). Do you think setting the ForEach activities to sequential within the loops could solve the problem? It might take forever, though...

 

There must be another way to make this work. I don’t see how switching to an even higher capacity makes sense for such a basic operation. I’d really appreciate it if you could provide me with other ideas or solutions.

 

Thanks!

Hi @__maca__ 

 

Apologies for the inconvenience and the delay in response.

Please reach out to our support team to gain deeper insights and explore potential solutions. It's highly recommended that you reach out to our support team. Their expertise will be invaluable in suggesting the most appropriate approach.

Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support

After creating a Support ticket please provide the ticket number as it would help us to track for more information.

 

Thank you.

Hi @__maca__ 

 

We haven’t heard from you on the last response and was just checking back to see if you've had a chance to submit a support ticket. If you have, a reference to the ticket number would be greatly appreciated. This will allow us to track the progress of your request and ensure you receive the most efficient support possible.

 

Thank you.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.