Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Get Fabric certified for FREE! Don't miss your chance! Learn more
Hello,
I'm trying to asses the possible cost savings of switching from PySpark to regular python notebooks for small to medium data.
The Microsoft documentation states that "Two Spark VCores equals one capacity unit (CU)."
I know that notebook in Python Experience (without Spark) by default have 2 VCores. Does the 2 VCores = 1 CU conversion also hold for Python Experience notebooks? I haven't been able to find this anywhere in the documentation.
Also, how long must 2vCores be running for 1 CU to be consumed? seconds? minutes? hours?
Kind regards,
Fabric operations - Microsoft Fabric | Microsoft Learn
Use Python experience on Notebook - Microsoft Fabric | Microsoft Learn
Solved! Go to Solution.
Hi @JeroenVDM what I meant is for large datasets, involving complex joins/windowing, or high parallelism, a single Spark session (sized appropriately) will finish faster and often cheaper in aggregate because it can distribute the work—especially with the Fabric's runtime 1.3 Native Execution Engine. Bear in mind you have the option to create Custom pools, with the right node size and scalability.
However, for samller datasets, if you're using DuckDB or Polars (that are high performance in-process analytical engines), you might get it run cheap on Python notebooks.
I didn't refer to the multithreading aspect in my first comment.
It is best you do some experimentation using your own small/medium and large sized datasets.
Hello @JeroenVDM
For both types of notebooks on Fabric, CU consumption is driven by the compute resources allocated during the notebook session.
For Spark Notebooks, billing is active as long as the nodes remain allocated. The Spark nodes are based on Spark pool sizes - S, M, L etc.
Hello @deborshi_nag,
Thank you for your reply.
If I am understanding you correctly, CU consumption works essentially the same for both Spark and Python notebooks. In both cases, it is determined by the amount of resources allocated multiplied by the duration for which those resources were used. In a hypothetical scenario where a Spark notebook and a Python notebook consumed the same amount of vCores/RAM for the same amount of time, the CU consumption would be identical.
The difference, of course, is that Python notebooks run on a single machine with statically allocated resources, while Spark runs on a cluster where the amount of allocated resources can change dynamically during runtime. But in both cases, the goal should be to maximize resource utilization.
Regarding the second part:
Your point is that running 3 separate transformations on a single, appropriately sized Spark cluster will always be cheaper than running the same transformations using 3 separate Python notebooks.
Does this approach necessitate the use of multithreading alongside Spark?
I previously came across the following article that discusses this topic:
https://www.guptaakashdeep.com/enhancing-spark-job-performance-multithreading/
At the moment, my company does not use multithreading with Spark, which is leading to poor resource utilization for ETL workloads on small datasets (less than 1 GB). Because of this, we are considering moving to Python notebooks.
In your opinion, is multithreading Spark actually the right approach for handling these small data tasks?
Kind regards,
Jeroen
Hi @JeroenVDM what I meant is for large datasets, involving complex joins/windowing, or high parallelism, a single Spark session (sized appropriately) will finish faster and often cheaper in aggregate because it can distribute the work—especially with the Fabric's runtime 1.3 Native Execution Engine. Bear in mind you have the option to create Custom pools, with the right node size and scalability.
However, for samller datasets, if you're using DuckDB or Polars (that are high performance in-process analytical engines), you might get it run cheap on Python notebooks.
I didn't refer to the multithreading aspect in my first comment.
It is best you do some experimentation using your own small/medium and large sized datasets.
If you love stickers, then you will definitely want to check out our Community Sticker Challenge!
Check out the January 2026 Fabric update to learn about new features.
| User | Count |
|---|---|
| 24 | |
| 4 | |
| 3 | |
| 3 | |
| 2 |