Re: Copilot for Fabric how to get estimates of inp...

hsn367 · ‎07-31-2024

We are thinking of adopting copilot for our data science team but we want to do some cost analysis before we start using it. So as per the fabric documentation copilot cost depends on the number of input tokens and number of output tokens and both types of tokens consume different CUs.

But now the problem is we cannot estimate the input tokens for a single request because as per this blog: "Fabric Copilot Pricing: An End-to-End example",

"A single user input prompt can result in multiple requests being sent from Fabric Copilot to Azure OpenAI. As per the example in the blog the prompt; “Load Customers from my Lakehouse into DataFrames” would generate 7 requests to Azure Open AI, with 4,967 input tokens and 227 output tokens."

Now I want to understand, how come this single promopt with 7 words consume 4967 tokens. More importantly, how can we estimate that how many input tokens are gonna be consumed for a single request that we send to copilot. Because, as per the example in blog it can be different and it does not only depend on how many tokens are there in user's input.

Anonymous · ‎07-31-2024

HI @hsn367,

I'd like to suggest you take a look the following link about copilot consumption in fabric:

Copilot consumption - Microsoft Fabric | Microsoft Learn

As the document said, the tokens works as the 'operation Unit of Measure' to calculate the 'Consumption rate' which used to calculate the FCU hour spend.

For the generated requests amount, that may caused with many factors:

Complexity of the Question: More complex questions may require multiple requests to gather all necessary information and provide a comprehensive answer.
Data Availability: If the data needed to answer the question is spread across multiple sources, more requests will be needed to collect and integrate this data.
User Interaction: The way users interact with the system can affect the number of requests. For example, follow-up questions or clarifications can lead to additional requests.
System Performance: The efficiency of the system in processing requests and retrieving data can impact the total number of requests. A more optimized system may require fewer requests to deliver the same information.
Query Specificity: Vague or broad questions might generate more requests as the system tries to narrow down the relevant information. Conversely, very specific questions might generate fewer requests.
Real-Time Data Requirements: Questions that require real-time data updates or continuous monitoring can lead to a higher number of requests.

In summary, more requests typically mean using more in/out tokens. Each request to the system involves processing input tokens (the text of the question or command) and generating output tokens (the response).
So optimizing the accuracy and reduce the complexity of each question can help manage token usage more efficiently.

Regards,

Xiaoxin Sheng

hsn367 · ‎07-31-2024

Hi @Anonymous thank you for the reply. I had read the link provided by you before I posted this question and I understand how the consumption is measured. I am interested in knowing how to roughly estimate the input tokens for a request. For example, as mentioned in the blog post "“Load Customers from my Lakehouse into DataFrames” would generate 7 requests to Azure Open AI, with 4,967 input tokens and 227 output tokens.""

Now here the data is not spread across multiple sources but rather we have a lakehouse attached to the notebook, so why 7 requests and how did it consume "4,967 " tokens.

Also it seems the person who wrote the blog knew exactly how many input tokens are being consumed for each request (and it's not a rough estimate), how can we know the number of input tokens for a single request?

You mentioned in your answer that "System Performance: The efficiency of the system in processing requests and retrieving data can impact the total number of requests. A more optimized system may require fewer requests to deliver the same information." Which system are you talking about here? How would we know how is the system performance for copilot behind the scenes?

Anonymous · ‎08-02-2024

Hi @hsn367,

For fabric copilot usages, perhaps you can use fabric capacity usage metrics to trace them or tried to trace these on azure side since fabric copilot will interact with azure OpenAI:

Copilot consumption - Monitor the usage

Using your data with Azure OpenAI Service - Azure OpenAI | Microsoft Learn

>>Which system are you talking about here? How would we know how is the system performance for copilot behind the scenes?

This is for Azure OpenAI, if you are interested about it, you can take a look at the following link about Azure OpenAI service model and versions:

Azure OpenAI Service model versions - Azure OpenAI | Microsoft Learn
Regards,

Xiaoxin Sheng

hsn367 · ‎08-02-2024

Hi @Anonymous thank you for your support.

Correct me if I am wrong but as per my knowledge the metrics app does not tell you the amount of input tokens you consume per request, it is just to understand how many CUs are consumed by all of the copilot requests for a particular capacity.

I am more interested in knowing the input tokens consumed by a single request, like for every request to copilot when it generates response it tells you the number of output tokens it generated. On the same lines I want to know the number of input tokens consumed by my request in the same way as explained in the blog that I mentioned.

Anonymous · ‎08-05-2024

Hi @hsn367,

If you want to trace the detail operation logs about these requests, I'd like to suggest you enter to azure side use Azure Monitor or Log Analytics to trace detailed informations. (fabric copilot also involved azure OpenAI so the operations will be records in azure logs, you may need to find the related operate id to find out correspond request id)

Implement logging and monitoring for Azure OpenAI language models - Azure Architecture Center | Micr...

Regards,

Xiaoxin Sheng

Copilot for Fabric how to get estimates of input tokens for a single request

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Copilot for Fabric how to get estimates of input tokens for a single request

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025