The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello,
I have/had a pipeline running hourly, which triggers a notebook run. Out of the blue some days ago, the notebook behaved erraticaly. All of the displayed info is rather inconsistent, but I tend to believe the pipeline run view that says that the notebook ran for nine hours and was then cancelled. Above you can see that the notebook snapshot is quite broken as the notebook hasn't really run (no cell output whatsoever) but is somehow still running.
Fine, so a notebook run failed, no biggie, right? Alas, this non-running notebook absolutely devoured our capacity and amounted to about 20% of the load on our F4 capacity. Also, and this is maybe the most amazing thing, somewhere in the middle of the nine hour run, we stopped and resumed the capacity. I would have thought that this kills all running processes, however, the notebook run still appears in the capacity metrics app AFTER the restart.
What is happening? Do I have any chance to prevent something like this from happening? How would I catch it as early as possible, before bursting and smoothing incurs a massive capacity credit? Has something like this even happened to anyone else?
Thanks for any help!
Solved! Go to Solution.
Hi @NotebookEnjoyer ,
what you can do to limit the runtime of the notebook is tweaking the Timeout field of the notebook activity in you data pipeline. The default is set to 12 hours (0.12:00:0). If you know that the notebook usually does not exceed 1 hour of runtime, you could set that as new limit to prevent it running out of bounds and shredding your capacity.
If you want more granular control, for instance a limitation of runtime per cell, consider using a notebook to run your notebook rather than a pipeline (using notebookutils run/runMultiple commands).
Aside from that, it would be interesting to see which error is raised and how your notebook looks like in the monitoring screenshot after it failed. I agree with you that the fact that it's shown as running even after rebooting the capacity is rather odd.
Hope this helps! 🙂
Hi @NotebookEnjoyer ,
what you can do to limit the runtime of the notebook is tweaking the Timeout field of the notebook activity in you data pipeline. The default is set to 12 hours (0.12:00:0). If you know that the notebook usually does not exceed 1 hour of runtime, you could set that as new limit to prevent it running out of bounds and shredding your capacity.
If you want more granular control, for instance a limitation of runtime per cell, consider using a notebook to run your notebook rather than a pipeline (using notebookutils run/runMultiple commands).
Aside from that, it would be interesting to see which error is raised and how your notebook looks like in the monitoring screenshot after it failed. I agree with you that the fact that it's shown as running even after rebooting the capacity is rather odd.
Hope this helps! 🙂
The notebook run snapshot shows absolutely nothing. No cell has any output. Which, I mean, is kind of consistent with the <1ms runtime.
Hi @NotebookEnjoyer ,
Thank you for reaching out to us on the Microsoft Fabric Community Forum.
The behavior you're seeing is due to capacity smoothing in Fabric. When a workload exceeds available capacity, the system allows short-term overages and then "smooths" the excess usage over the next 24 hours. That’s why the notebook still appears in the Capacity Metrics App even after restarting the capacity.
If this post was helpful, please give us Kudos and consider marking Accept as solution to assist other members in finding it more easily.
But isn't the capacity debt paid and billed explicitly when restarting the capacity? That was my assumption and also aligns with the metrics app behavior which shows an enormous uptick in load right at the restart point and then the load starts way lower than before.
See here: Pause and resume your capacity - Microsoft Fabric | Microsoft Learn
"When you pause your capacity, the remaining cumulative overages and smoothed operations on your capacity are summed, and added to your Azure bill. You can monitor a paused capacity using the Microsoft Fabric Capacity Metrics app.
If your capacity is being throttled, pausing it stops the throttling and returns your capacity to a healthy state immediately. This behavior enables you to pause your capacity as a self-service mechanism that ends throttling."
Hi @NotebookEnjoyer ,
when the capacity is paused and resumed, any outstanding overages are billed right away. That explains the spike in the metrics at the restart, and it matches what the docs say.
The tricky part here is that the notebook still shows as “running,” which can definitely be confusing. From what I understand, this is more about how the Jupyter runtime handles sessions if the kernel isn’t manually stopped, it can keep showing as active, even though it’s no longer using capacity after the restart.
So yes, the billing gets taken care of correctly, but the notebook status might not always reflect that in a straightforward way.
Thank you and Regards,
Menaka Kota.
Hi @NotebookEnjoyer ,
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.
however, the notebook run still appears in the capacity metrics app AFTER the restart.
That is "normal" and is called smoothing. Smoothing is active for 24 hours until after the activity concluded. This will show if your pausing of the capacity was unable to pay back all the overages straight away.
I replied to the similar answer above.