Re: Fabric Custom Prompt AI Function

AnthonySottile · ‎06-23-2025

Hi everyone,

I am trying to use the custom prompt AI function using Fabric's default LLM settings. I am trying to send over 100k records through the LLM but this is resulting in immediate capacity issues due to the volume of records being sent. I am curious if anyone has also experienced these issues and if so, how did you get around them? Does using my own key and endpoint set up in Azure AI foundary work or will that also create capacity issues. More details below:

Environment

Workspace SKU: F128 (autoscale for Spark ON)
Feature: Custom Prompt AI (Fabric’s default LLM endpoint)
Use case: Classifying ~100 K customer-address records in one pass
Data size: ≈ 1 M tokens per run (each record ~10 tokens × 100 K)

Problem
Whenever I call the endpoint with the full batch, capacity utilization spikes > 300 % and Fabric immediately throttles the workspace. Even if I chunk the dataframe into 5–10 calls, the bursts still blow the CU budget and the overage burndown takes hours.

What I’ve tried

Tactic Result

ai.generate_response() on the full Spark DF	Instant overage / throttling
Micro-batching (10 K rows)	8–10 bursts, still > 200 % CU
Switching to a lower-temp model	Token cost drops, CU spike unchanged
Using Autoscale for Spark pools	Helps after the fact (burndown), not at call time

Questions for the community

Batch / stream patterns – Has anyone found a sweet-spot batch size or sliding-window pattern that lets you keep CU < 100 % while feeding large dataframes to the LLM?
Async vs. sync calls – Can I fire off smaller async requests and aggregate the responses later without paying the “concurrent CU penalty” in one big spike?
Queue or orchestrator – Does Fabric offer a built-in queue for LLM calls (similar to Synapse Spark job queue) or is everyone rolling their own (e.g., Delta table + Logic Apps / Data Factory orchestrator)?
Model / capacity separation – Is it possible to point Custom Prompt AI to a pay-as-you-go Azure OpenAI resource so the heavy LLM work lands outside the Fabric capacity?
Throttle-aware retry logic – Any sample code that backs off when Peak % > X and resumes when burndown < Y?

I’d love to hear how others are handling high-volume LLM inference in Fabric without upgrading to an even larger SKU or waiting hours for overages to clear.

uselessai_in · ‎08-24-2025

Here you go : https://medium.com/uselessai-in/ai-functions-in-fabric-microsoft-fabric-ai-capabilities-8375db5fdd2a

v-prasare · ‎07-11-2025

We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?
If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.
Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.

Thank you for your understanding and participation.

v-prasare · ‎07-08-2025

Hi @AnthonySottile ,

If your issue still persists, please consider raising a support ticket for further assistance.
To raise a support ticket for Fabric and Power BI, kindly follow the steps outlined in the following guide:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

Thanks,

Prashanth Are

MS Fabric community support

v-prasare · ‎06-30-2025

hi @AnthonySottile,

As we haven’t heard back from you, we wanted to kindly follow up to check if there is any progress on above mentioned issue. let me know if you still need any further help here.

Thanks,

Prashanth Are

MS Fabric community support

AnthonySottile · ‎07-06-2025

@lbendlin response is helpful but does not help me solve this issue.

v-prasare · ‎06-27-2025

Hi @AnthonySottile,

We would like to follow up to see if the solution provided by the super user resolved your issue. Please let us know if you need any further assistance.

@lbendlin, thanks for your prompt response.

Thanks,

Prashanth Are

MS Fabric community support

If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

lbendlin · ‎06-23-2025

The only thing I can advise on is 5. Look into capacity Surge Protection. NOTE: That is based on Background Rejection % (the very last tab) and you need to gauge a value that works for you. For example we use 37% for on and 35% for off, but your value will most likely be different.

Fabric Custom Prompt AI Function

Helpful resources

Fabric Monthly Update - September 2025

Fabric Community Update - August 2025

FabCon is coming to Atlanta

Fabric Custom Prompt AI Function

Helpful resources

Fabric Monthly Update - September 2025

Fabric Community Update - August 2025