Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
There appears to be a Fabric bug in notebooks. A notebook seems to fail frequently in a "custom" spark pool but not in the "starter" spark pool. That bug is not the topic of this post, but I wanted to explain the reason why it might be important to dynamically select a notebook environment.
I was going to do more research, without creating duplicated pipelines and notebooks. This would introduce even more unnecessary variables than the one I'm trying to investigate.
Is there any way to parameterize the selection of a notebook "environment"? Ideally this would be managed in the outer pipeline. I would create multiple "environments" with a different pool in each. Then I would run them at different times of the day, from the same pipeline. If a given "environment" is highly correlated to a failure, then it will help me discover the source of our problems .
Below is the place where an environment selection is made in a notebook.
Any tips would be appreciated.
Solved! Go to Solution.
I think I finally found an API that would work. This could be used from an ADF pipeline. It appears that it may allow us to specify what environment to use.
https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-public-api#run-a-notebook-on-dema...
Hi @dbeavon3
Using parameters is indeed a good idea. Here are some suggestions I have for this idea:
Create parameters in your pipeline to specify the environment. For example, you could have a parameter called environment whose value could be "environment 1" or "environment 2".
When you call a notebook from your pipeline, pass the environment parameter to the notebook.
Use the passed parameter in the notebook to select the appropriate environment. You can use conditionals to switch between environments based on the parameter value.
Pipeline Configuration
{
"parameters": {
"environment": {
"type": "string",
"default": "starter",
"allowedValues": ["environment 1", "environment 2"]
}
},
"activities": [
{
"name": "RunNotebook",
"type": "DatabricksNotebook",
"dependsOn": [],
"userProperties": [],
"linkedServiceName": {
"referenceName": "DatabricksLinkedService",
"type": "LinkedServiceReference"
},
"typeProperties": {
"notebookPath": "/path/to/your/notebook",
"baseParameters": {
"environment": {
"value": "@pipeline().parameters.environment"
}
}
}
}
]
}
In your notebook, you can use parameters to select the environment:
dbutils.widgets.text("environment", "environment 1")
environment = dbutils.widgets.get("environment")
if environment == "environment 1":
# Set up starter environment
spark.conf.set("spark.sql.shuffle.partitions", "200")
elif environment == "environment 2":
# Set up custom environment
spark.conf.set("spark.sql.shuffle.partitions", "500")
This way, you can run the same notebook in different environments by simply changing the parameter value in your pipeline.
Hope this idea helps you.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Anonymous
In Fabric-spark there is a concept called the "environment" of the notebook. I'm not using the word environment in a generalized sense. I'm referring to the selection of this notebook "environment".
I think I understand your dynamic reconfiguration of the notebook spark session, but that isn't what I'm looking for...
The main thing I need to be able to do is swap back and forth between "custom" spark pool and the "starter" spark pool. This would allow me to isolate bugs in the platform which are unusual in that they impact one type of spark pool and not the other. In the perfect world the "custom" pools wouldn't have additional reliability concerns. But we are facing issues, and Microsoft does not seem eager to engage with their "professional" support cases. So I am doing additional work to troubleshoot error messages for them. My plan is currently to pinpoint the bug, leverage this bug to create a painful week-long outage in production, and then open a case with "unified" support (it is Microsoft's secondary support organization, which is presumably more willing to work on bugs in Fabric). I've learned that you sometimes have to pay a bit more to get an Azure bug fixed (or even get it to be acknowledged).
Hi @dbeavon3
Thank you for your feedback, we understand your frustration about this matter.
Based on the detailed information you provided, I realize that this aspect is not the focus of this forum. You can go to the following link to ask questions, where more professional people will assist you in solving the problem:
Get Help with Data Warehouse - Microsoft Fabric Community
Thank you again for your understanding. If you have more questions about Dataflow or Data Pipeline, you are welcome to continue asking in this forum! We'll do our best to help.
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
I think I finally found an API that would work. This could be used from an ADF pipeline. It appears that it may allow us to specify what environment to use.
https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-public-api#run-a-notebook-on-dema...
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |
User | Count |
---|---|
4 | |
2 | |
2 | |
1 | |
1 |