Solved: Does the 2 VCores = 1 CU conversion also hold for ...

JeroenVDM · ‎02-02-2026

Hello,

I'm trying to asses the possible cost savings of switching from PySpark to regular python notebooks for small to medium data.

The Microsoft documentation states that "Two Spark VCores equals one capacity unit (CU)."

I know that notebook in Python Experience (without Spark) by default have 2 VCores. Does the 2 VCores = 1 CU conversion also hold for Python Experience notebooks? I haven't been able to find this anywhere in the documentation.

Also, how long must 2vCores be running for 1 CU to be consumed? seconds? minutes? hours?

Kind regards,

Fabric operations - Microsoft Fabric | Microsoft Learn

Use Python experience on Notebook - Microsoft Fabric | Microsoft Learn

deborshi_nag · ‎02-02-2026

Hi @JeroenVDM what I meant is for large datasets, involving complex joins/windowing, or high parallelism, a single Spark session (sized appropriately) will finish faster and often cheaper in aggregate because it can distribute the work—especially with the Fabric's runtime 1.3 Native Execution Engine. Bear in mind you have the option to create Custom pools, with the right node size and scalability.

However, for samller datasets, if you're using DuckDB or Polars (that are high performance in-process analytical engines), you might get it run cheap on Python notebooks.

I didn't refer to the multithreading aspect in my first comment.

It is best you do some experimentation using your own small/medium and large sized datasets.

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

View solution in original post

deborshi_nag · ‎02-02-2026

Hello @JeroenVDM

For both types of notebooks on Fabric, CU consumption is driven by the compute resources allocated during the notebook session.

For Spark Notebooks, billing is active as long as the nodes remain allocated. The Spark nodes are based on Spark pool sizes - S, M, L etc.

Spark CU consumption = allocated Spark nodes × time the notebook session is active

For Python notebooks, these are run on a single-node container, not Spark clusters. Each Python notebook session gets allocated 2 vCores, 16GB, and each notebook consumes CU independently. So, if you run three Python notebooks, you have 3 separate 2vCore containers. Billing is based on as long as the Python session is active.

In my view if you were running 3 separate transformations, running it on 1 Spark cluster (sized appropriately) will be always cheaper than running the same transformations using 3 separate Python notebooks. Note, that Python notebooks are also not scalable - they get the fixed compute.

Fabric Capacity Metrics app provides visibility into capacity usage for all Fabric workloads in one place. If you run a notebook job, you can see the notebook run, the CUs used by the notebook (Total Spark VCores/2 as 1 CU gives 2 Spark VCores), duration the job has taken in the report. The Capacity Metric shows duration in seconds.

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.

JeroenVDM · ‎02-02-2026

Hello @deborshi_nag,

Thank you for your reply.

If I am understanding you correctly, CU consumption works essentially the same for both Spark and Python notebooks. In both cases, it is determined by the amount of resources allocated multiplied by the duration for which those resources were used. In a hypothetical scenario where a Spark notebook and a Python notebook consumed the same amount of vCores/RAM for the same amount of time, the CU consumption would be identical.

The difference, of course, is that Python notebooks run on a single machine with statically allocated resources, while Spark runs on a cluster where the amount of allocated resources can change dynamically during runtime. But in both cases, the goal should be to maximize resource utilization.

Regarding the second part:

Your point is that running 3 separate transformations on a single, appropriately sized Spark cluster will always be cheaper than running the same transformations using 3 separate Python notebooks.

Does this approach necessitate the use of multithreading alongside Spark?

I previously came across the following article that discusses this topic:

https://www.guptaakashdeep.com/enhancing-spark-job-performance-multithreading/

At the moment, my company does not use multithreading with Spark, which is leading to poor resource utilization for ETL workloads on small datasets (less than 1 GB). Because of this, we are considering moving to Python notebooks.

In your opinion, is multithreading Spark actually the right approach for handling these small data tasks?

Kind regards,
Jeroen

deborshi_nag · ‎02-02-2026

Hi @JeroenVDM what I meant is for large datasets, involving complex joins/windowing, or high parallelism, a single Spark session (sized appropriately) will finish faster and often cheaper in aggregate because it can distribute the work—especially with the Fabric's runtime 1.3 Native Execution Engine. Bear in mind you have the option to create Custom pools, with the right node size and scalability.

However, for samller datasets, if you're using DuckDB or Polars (that are high performance in-process analytical engines), you might get it run cheap on Python notebooks.

I didn't refer to the multithreading aspect in my first comment.

It is best you do some experimentation using your own small/medium and large sized datasets.

I trust this will be helpful. If you found this guidance useful, you are welcome to acknowledge with a Kudos or by marking it as a Solution.