Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
HI, We are using F8 capacity, we are have requirement of running 3 jobs once in every 5 mins, so the issue we are facing is capacity limit issue and we are unable to do anything once the limit is reached. The data volume is very low for all the three jobs so we are using python Notebooks instead of pyspark. Is increasing the capacity the only option we have?
We have a requirement to pull the data from 3 different APIs every 5 mins, currently we are using python script to pull the data and load it to lakehouse table. Each job is taking around a minute to complete. How do we make sure we don't reach the limits ? Is there any other alternate cost effective solution to achive this?
Solved! Go to Solution.
In this scenario, we can be sure that each notebook requires its own session for running. A python notebook runs on a 2 VCore, 16 GB RAM single node cluster. Having three of these running around the clock can consume a good chunk of your capacity.
I think it's worth it to reconsider the orchestration with the following things in mind:
- Running all three notebook from a single pipeline with High Concurrency for pipelines enabled in the Spark settings will cause all three notebook being run in the same session, limiting the amount of blocked VCores to what is consumed during a single session
- Alternatively, you can create a fourth ("control") notebook which leverages the runMultiple() function of notebookutils library to run all three notebooks within the same session, orchestrated from a notebook. The control notebook is then triggered from a data pipeline
Both of these option would result in your session being handed from one notebook to another rather than having all three notebook use their own. Implementing one of these options could help reduce the bound resources of your capacity.
Hope this helps! 🙂
Hi @fabric_1,
As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?
Thanks,
Prashanth Are
MS Fabric community support
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query
Hi @fabric_1,
Thanks for actively participating in MS Fabric community support
As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?
Thanks,
Prashanth Are
MS Fabric community support
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query
Hi @fabric_1,
Thanks for actively participating in MS Fabric community support
As we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?
Thanks,
Prashanth Are
MS Fabric community support
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query
Hi @fabric_1 ,
I understand you have 3 separate jobs running every 5 mins, however I am wondering how these jobs are orchestrated. Do you run these from a pipeline or do you have 3 separate schedules on the notebooks themselves in place? Can you shed some light on that? 🙂
Kind regards,
Niels
In this scenario, we can be sure that each notebook requires its own session for running. A python notebook runs on a 2 VCore, 16 GB RAM single node cluster. Having three of these running around the clock can consume a good chunk of your capacity.
I think it's worth it to reconsider the orchestration with the following things in mind:
- Running all three notebook from a single pipeline with High Concurrency for pipelines enabled in the Spark settings will cause all three notebook being run in the same session, limiting the amount of blocked VCores to what is consumed during a single session
- Alternatively, you can create a fourth ("control") notebook which leverages the runMultiple() function of notebookutils library to run all three notebooks within the same session, orchestrated from a notebook. The control notebook is then triggered from a data pipeline
Both of these option would result in your session being handed from one notebook to another rather than having all three notebook use their own. Implementing one of these options could help reduce the bound resources of your capacity.
Hope this helps! 🙂
5 minutes is the sweet/sour spot. You are keeping your pools warm at all times and don't give your capacity an opportunity to cool down. Either stretch your interval (and accept startup delays) or go for a better capacity.