Capacity Derive

BalajiL · ‎09-28-2025

I’m reaching out for a quick assist in validating the appropriate Microsoft Fabric SKU for a customer currently in the proposal stage. The customer is looking for ROI insights compared to their existing Databricks platform.

Below are the inventory details of the current Databricks system.

Historical Data Volume: 33 TB

Daily Data Ingestion: 100 GB

Spark Notebooks: 320

Pipelines: 60

Power BI Users: 1000+ users; Assumed ~100 daily viewers

Tables: 500

30 batch cycles running daily.

The average cost spent on monthly for the current system is $80,000 pm

Based on the SKU estimator derived the cost and ROI, the ROI appears significantly higher.

Platform	Compute Cost (Monthly USD)	Storage Cost (Monthly USD)	Total Monthly Cost (USD)	Total Annual Cost (USD)	ROI (%) (Fabric Prod + Dev 32 SKU)
Databricks (Actual)	$79,673.00	$6,052.00	$85,725.00	$10,28,700.00	-
Fabric F128 (Reserved) – Prod	$10,005.00	$4,415.08	$14,420.08	$1,73,040.96	79%
Fabric F256 (Reserved) - Prod	$20,010.00	$4,415.08	$24,425.08	$2,93,100.96	67%

Could you please help validate the Fabric SKU.

Please find the below screenshot of Fabric SKU calculator. Assumed the compressed file is 17000 GB and number of table is 550 and number of cycle mentioned is 4 in that case - SKU is F128; But in the current system the job is running 30 times in a day ( 30 batches) if i tweak it to 30 then the SKU reached to F2048. Please advice how should I arrive right SKU.

v-karpurapud · ‎10-05-2025

Hi @BalajiL

We have not received a response from you regarding the query and were following up to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.

Thank You.

BalajiL · ‎10-05-2025

Hi @v-karpurapud

Looks the SKU calculator requires lot of improvement to derive the capacity for the customer ask. Many projects are running in Databricks, Synapse etc. There should be a way to map the existing load parameters exists and to derive capacity. The SKU calculator provides varies SKU with the same parameter input in multiple iteration sometimes it shows F256 or F512 without any modification on the parameter. Could you please help how to use batch cycle in the sku calculator with various tables and batches - it is not straight forward approach. I would request this should be considered and enhanced further.

v-karpurapud · ‎10-06-2025

Hi @BalajiL
Thank you for providing such comprehensive feedback.

You have highlighted an important consideration regarding the SKU calculator's interpretation of workload parameters like batch cycles, table counts, and data volume. Currently, the calculator views the “number of batch cycles” as a measure of concurrent workload intensity, rather than the total number of runs over a day. This approach may lead to overestimating capacity needs, especially when multiple smaller or regional batches are run sequentially instead of concurrently.

The fluctuations in estimated SKUs (such as switching between F256 and F512 without clear parameter changes) may be attributed to the tool’s internal rounding or recalculation logic. For more precise results, it is recommended to enter peak concurrency values the highest number of jobs or pipelines running in parallel and the data volume processed concurrently, rather than total daily or regional batch runs.

For further improvements and enhancements, it is recommended to share this feedback through theCommunity Feedback - Microsoft Fabric Community. Submitting detailed input there will help the product team better understand real-world use cases and prioritize enhancements such as clearer parameter definitions, improved handling of sequential versus concurrent workloads, and stronger alignment with Databricks or Synapse metrics for more accurate capacity sizing.

Regards,

Microsoft Fabric Community Support Team.

BalajiL · ‎10-06-2025

Thanks @v-karpurapud , I will share the details in the feedback forum, so that these information reach out to product team and they can work on enhancing calculator.

v-karpurapud · ‎10-08-2025

Hi,

Thank you for the update. Hope you issue resolves soon.

BhaveshPatel · ‎10-04-2025

Hi @BalajiL

Yes. You have to evaluate capacities when you are dealing with the big data volume. But as I can see they you only have 550 tables meaning how many tables are fact tables vs Dim Tables. If we have 50 tables as a big fact tables (What is Size, No of rows ) , I think so F128 capacity should be fine depending on what approach you are going to take. Also, you only have 320 Spark Notebooks, and Most of the tables are duplicates, so I can say that your data is medium-sized.

We have done the same project around 250 Spark Notebooks, and I was the only developer who was doing end to end approach including CI/CD as well.

For the best advice, you can contact me at my e-mail address ( bpatel@alphaanalyticsaus.onmicrosoft.com )
For best practices,

1. Use Power BI Dataflow Gen 2 to land data in Fabric SQL Database.

2. Also, Use Kimball Methodology Approach. ( Use Bus matrix, Use Surrogate Key, Use Fact and Use Dimensions ) - Star Schema

2. Use Microsoft Purview to get a Audit Log etc...

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

v-karpurapud · ‎10-02-2025

Hi @BalajiL

Thank you for submitting your question to the Microsoft Fabric Community Forum, and thanks to @tayloramy for offering helpful suggestions.

Could you let us know if the suggested solution resolved your issue?If you still need help, please share more details so we can assist you further.

Thank you.

tayloramy · ‎09-29-2025

Hi @BalajiL,

This is a pretty massive load. I think you'd need multiple capacities to handle this, as a single F2048 might not be enough for that much data volume happening that frequently.

Remember that the wya capacity throttling and smoothing works for background operations (like pipeline runs) is that the usage is smoothed over a 24 hour period, so running something 30 times over the day and running it 30 times all at once has the same impact to the capacity.

Are you able to provide more details? YOu mention that 33TB is the total data size, but do you know how much data is loaded generally for each batch?

How many tables are the batches loading?

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!

BalajiL · ‎09-29-2025

Thanks @tayloramy for your prompt response. Yet to know how much data is loaded in each batch ( am working on it) but the batch is running through out a day for six regions like (US, UK, APAC etc..) and multiple frequencies like 24 hr pipeline window, 2 hr pipeline, 4 hr pipeline window.

tayloramy · ‎09-30-2025

Hi @BalajiL,

It sounds like you will need a lot of capacities.
I'd recommend starting with an F512 or F1024 and see how efficient you can build the processes, but be prepared to upgrade to F2048 and to purchase an additional capacity if required.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.

If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Proud to be a Super User!

Capacity Derive

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - February 2026

FabCon Atlanta 2026

FabCon is coming to Atlanta

Capacity Derive

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - February 2026

FabCon Atlanta 2026