Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
JoMo
New Member

Dual SKU for splitting workload

Hello everyone

 

I need help to understand how workload between capacities works in order to properly plan our use of Fabric.

 

Context: we are an SME (200 employees) and we are currently migrating our on premise data warehouse to Fabric, in a medallion architecture. We have a small team of analysts. For the moment, all the analysts querry our on-premise data warehouse for their queries. We do, however, have a data scientist who does advanced analytics / modeling / ML on Fabric.

 

In terms of capacity, we have an F4 SKU to start with. We have already experienced capacity overload, despite smoothing, with analytics work. What's more, not every ETL in our warehouse runs at the desired interval, and not every analyst has been on-boarded on Fabric.

 

So I want to properly assess and plan the organization's capacity requirements before increasing the SKU. To make sure that ETLs are never compromised due to analytics work, I thought I'd have 2 SKUs, 1 for ETLs and 1 for analytics work.

 

So I want to evaluate the advantages and disadvantages of such an architecture. Do you have any toughts on this?

 

Example : If I have my data warehouse in SKU#1 and an analyst runs a Notebook into a workspace that's on SKU#2, which does a querry on a lakehouse that's connected to the data warehouse on SKU#1, how will the workload be distributed between SKU#1 and SKU#2?

 

Thank you in advance

 

Joe

2 ACCEPTED SOLUTIONS
datacoffee
Most Valuable Professional
Most Valuable Professional

Hello

 

The SKU usage is based on what capacity is using which services.

 

if you query a Lakehouse in SKU1 from a different capacity, SKU2, then the read from the OneLake tables will be paid by the Lakehouse capacity, SKU1, and the write will be paid by SKU2.

 

The same thing goes for any other workload.

reads and query data processing is paid by the "giving" SKU, and the live wrangling and storage is paid by the "asking" SKU.

 

i recently saw a post about Microsoft working on something around shared Copilot capacity, to use across SKUs.

 

you can also define a minimum available capacity unit amount for each workspace, to guarantee the correct compute power when needed.

 

cheers

 

View solution in original post

datacoffee
Most Valuable Professional
Most Valuable Professional

I can see from my workloads in my demo environments that when I read from SKU1 with a Notebook from SKU2, then SKU1 pays for the read, SKU2 pays for the spark compute and write.

 

It also makes sense, as the spark compute is configured in the local sku (in this case SKU2) and not on the SKU1 area. My compute is writing to SKU2, which is why I mentioned the cost for that in my above scenario 🙂

 

I can't find the documentation for the reserved max compute, but the screenshot in the portal is like this:

 

datacoffee_0-1732609003928.png

 

View solution in original post

5 REPLIES 5
v-shex-msft
Community Support
Community Support

Hi @JoMo ,

Did the above suggestions help with your scenario? if that is the case, you can consider Kudo or Accept the helpful suggestions to help others who faced similar requirements.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.
frithjof_v
Community Champion
Community Champion

"We have already experienced capacity overload, despite smoothing, with analytics work"

 

Did you experience interactive delays, interactive rejection or background rejection?

 

https://learn.microsoft.com/en-us/fabric/enterprise/throttling#future-smoothed-consumption

 

If you only experience interactive delays or interactive rejection, that doesn't hinder background operations from running. So in that case, they could stay on the same capacity.

 

In general, having a single, bigger capacity (F8) makes you able to do more and better utilize the total compute resources than if you have two smaller capacities (2xF4).

 

However, afaik there is no way to prioritize certain workspaces in a single capacity. Unfortunately, in order to achieve resource isolation (avoiding that one uncontrolled workload blocks all other workloads), you would need to separate the workspaces onto two capacities (e.g. 2 x F4, F2 + F4, F4 +F8, etc.)

datacoffee
Most Valuable Professional
Most Valuable Professional

Hello

 

The SKU usage is based on what capacity is using which services.

 

if you query a Lakehouse in SKU1 from a different capacity, SKU2, then the read from the OneLake tables will be paid by the Lakehouse capacity, SKU1, and the write will be paid by SKU2.

 

The same thing goes for any other workload.

reads and query data processing is paid by the "giving" SKU, and the live wrangling and storage is paid by the "asking" SKU.

 

i recently saw a post about Microsoft working on something around shared Copilot capacity, to use across SKUs.

 

you can also define a minimum available capacity unit amount for each workspace, to guarantee the correct compute power when needed.

 

cheers

 

I'm interested about this part:

 

"if you query a Lakehouse in SKU1 from a different capacity, SKU2, then the read from the OneLake tables will be paid by the Lakehouse capacity, SKU1, and the write will be paid by SKU2."

 

Is it documented anywhere?

 

Afaik, the OneLake read/write transactions are billed to the owning capacity, unless the read/writes happen through a shortcut. The storage is always billed to the owning capacity.

 

The notebook compute (and similarly Dataflow Gen2, Pipeline, etc.) will be billed to the working capacity.

 

 

"you can also define a minimum available capacity unit amount for each workspace, to guarantee the correct compute power when needed"

 

Is this documented anywhere / where can we find this setting?

 

Thanks!

datacoffee
Most Valuable Professional
Most Valuable Professional

I can see from my workloads in my demo environments that when I read from SKU1 with a Notebook from SKU2, then SKU1 pays for the read, SKU2 pays for the spark compute and write.

 

It also makes sense, as the spark compute is configured in the local sku (in this case SKU2) and not on the SKU1 area. My compute is writing to SKU2, which is why I mentioned the cost for that in my above scenario 🙂

 

I can't find the documentation for the reserved max compute, but the screenshot in the portal is like this:

 

datacoffee_0-1732609003928.png

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.