Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.

Reply
AnthonySottile
Frequent Visitor

Fabric Custom Prompt AI Function

Hi everyone,

 

I am trying to use the custom prompt AI function using Fabric's default LLM settings. I am trying to send over 100k records through the LLM but this is resulting in immediate capacity issues due to the volume of records being sent. I am curious if anyone has also experienced these issues and if so, how did you get around them? Does using my own key and endpoint set up in Azure AI foundary work or will that also create capacity issues. More details below:

 

Environment

  • Workspace SKU: F128 (autoscale for Spark ON)

  • Feature: Custom Prompt AI (Fabric’s default LLM endpoint)

  • Use case: Classifying ~100 K customer-address records in one pass

  • Data size: ≈ 1 M tokens per run (each record ~10 tokens × 100 K)

Problem
Whenever I call the endpoint with the full batch, capacity utilization spikes > 300 % and Fabric immediately throttles the workspace. Even if I chunk the dataframe into 5–10 calls, the bursts still blow the CU budget and the overage burndown takes hours.

What I’ve tried

Tactic Result
ai.generate_response() on the full Spark DFInstant overage / throttling
Micro-batching (10 K rows)8–10 bursts, still > 200 % CU
Switching to a lower-temp modelToken cost drops, CU spike unchanged
Using Autoscale for Spark poolsHelps after the fact (burndown), not at call time
 

Questions for the community

  1. Batch / stream patterns – Has anyone found a sweet-spot batch size or sliding-window pattern that lets you keep CU < 100 % while feeding large dataframes to the LLM?

  2. Async vs. sync calls – Can I fire off smaller async requests and aggregate the responses later without paying the “concurrent CU penalty” in one big spike?

  3. Queue or orchestrator – Does Fabric offer a built-in queue for LLM calls (similar to Synapse Spark job queue) or is everyone rolling their own (e.g., Delta table + Logic Apps / Data Factory orchestrator)?

  4. Model / capacity separation – Is it possible to point Custom Prompt AI to a pay-as-you-go Azure OpenAI resource so the heavy LLM work lands outside the Fabric capacity?

  5. Throttle-aware retry logic – Any sample code that backs off when Peak % > X and resumes when burndown < Y?

I’d love to hear how others are handling high-volume LLM inference in Fabric without upgrading to an even larger SKU or waiting hours for overages to clear.

7 REPLIES 7
v-prasare
Community Support
Community Support


We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?
If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.
Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.

Thank you for your understanding and participation.

v-prasare
Community Support
Community Support

Hi @AnthonySottile , 

If your issue still persists, please consider raising a support ticket for further assistance.
To raise a support ticket for Fabric and Power BI, kindly follow the steps outlined in the following guide:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

 

 

Thanks,

Prashanth Are

MS Fabric community support

v-prasare
Community Support
Community Support

hi @AnthonySottile,

As we haven’t heard back from you, we wanted to kindly follow up to check if there is any progress on above mentioned issue. let me know if you still need any further help here.

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

@lbendlin response is helpful but does not help me solve this issue.

v-prasare
Community Support
Community Support

Hi @AnthonySottile,

We would like to follow up to see if the solution provided by the super user resolved your issue. Please let us know if you need any further assistance.

 

@lbendlin, thanks for your prompt response.

 


Thanks,

Prashanth Are

MS Fabric community support


If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

lbendlin
Super User
Super User

The only thing I can advise on is 5.  Look into capacity Surge Protection.  NOTE: That is based on Background Rejection % (the very last tab) and you need to gauge a value that works for you.  For example we use 37% for on and 35% for off, but your value will most likely be different.

Helpful resources

Announcements
September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.