Solved: Re: Parameterization of environment when running n...

dbeavon3 · ‎01-20-2025

There appears to be a Fabric bug in notebooks. A notebook seems to fail frequently in a "custom" spark pool but not in the "starter" spark pool. That bug is not the topic of this post, but I wanted to explain the reason why it might be important to dynamically select a notebook environment.

I was going to do more research, without creating duplicated pipelines and notebooks. This would introduce even more unnecessary variables than the one I'm trying to investigate.

Is there any way to parameterize the selection of a notebook "environment"? Ideally this would be managed in the outer pipeline. I would create multiple "environments" with a different pool in each. Then I would run them at different times of the day, from the same pipeline. If a given "environment" is highly correlated to a failure, then it will help me discover the source of our problems .

Below is the place where an environment selection is made in a notebook.

Any tips would be appreciated.

dbeavon3 · ‎01-21-2025

I think I finally found an API that would work. This could be used from an ADF pipeline. It appears that it may allow us to specify what environment to use.

https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-public-api#run-a-notebook-on-dema...

View solution in original post

Anonymous · ‎01-20-2025

Hi @dbeavon3

Using parameters is indeed a good idea. Here are some suggestions I have for this idea:

Create parameters in your pipeline to specify the environment. For example, you could have a parameter called environment whose value could be "environment 1" or "environment 2".

When you call a notebook from your pipeline, pass the environment parameter to the notebook.

Use the passed parameter in the notebook to select the appropriate environment. You can use conditionals to switch between environments based on the parameter value.

Pipeline Configuration

{
  "parameters": {
    "environment": {
      "type": "string",
      "default": "starter",
      "allowedValues": ["environment 1", "environment 2"]
    }
  },
  "activities": [
    {
      "name": "RunNotebook",
      "type": "DatabricksNotebook",
      "dependsOn": [],
      "userProperties": [],
      "linkedServiceName": {
        "referenceName": "DatabricksLinkedService",
        "type": "LinkedServiceReference"
      },
      "typeProperties": {
        "notebookPath": "/path/to/your/notebook",
        "baseParameters": {
          "environment": {
            "value": "@pipeline().parameters.environment"
          }
        }
      }
    }
  ]
}

In your notebook, you can use parameters to select the environment:

dbutils.widgets.text("environment", "environment 1")
environment = dbutils.widgets.get("environment")

if environment == "environment 1":
    # Set up starter environment
    spark.conf.set("spark.sql.shuffle.partitions", "200")
elif environment == "environment 2":
    # Set up custom environment
    spark.conf.set("spark.sql.shuffle.partitions", "500")

This way, you can run the same notebook in different environments by simply changing the parameter value in your pipeline.

Hope this idea helps you.

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

dbeavon3 · ‎01-21-2025

Hi @Anonymous

In Fabric-spark there is a concept called the "environment" of the notebook. I'm not using the word environment in a generalized sense. I'm referring to the selection of this notebook "environment".

I think I understand your dynamic reconfiguration of the notebook spark session, but that isn't what I'm looking for...

The main thing I need to be able to do is swap back and forth between "custom" spark pool and the "starter" spark pool. This would allow me to isolate bugs in the platform which are unusual in that they impact one type of spark pool and not the other. In the perfect world the "custom" pools wouldn't have additional reliability concerns. But we are facing issues, and Microsoft does not seem eager to engage with their "professional" support cases. So I am doing additional work to troubleshoot error messages for them. My plan is currently to pinpoint the bug, leverage this bug to create a painful week-long outage in production, and then open a case with "unified" support (it is Microsoft's secondary support organization, which is presumably more willing to work on bugs in Fabric). I've learned that you sometimes have to pay a bit more to get an Azure bug fixed (or even get it to be acknowledged).

Anonymous · ‎01-21-2025

Hi @dbeavon3

Thank you for your feedback, we understand your frustration about this matter.

Based on the detailed information you provided, I realize that this aspect is not the focus of this forum. You can go to the following link to ask questions, where more professional people will assist you in solving the problem:

Get Help with Data Warehouse - Microsoft Fabric Community

Thank you again for your understanding. If you have more questions about Dataflow or Data Pipeline, you are welcome to continue asking in this forum! We'll do our best to help.

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

dbeavon3 · ‎01-21-2025

I think I finally found an API that would work. This could be used from an ADF pipeline. It appears that it may allow us to specify what environment to use.

https://learn.microsoft.com/en-us/fabric/data-engineering/notebook-public-api#run-a-notebook-on-dema...

Anonymous · ‎01-21-2025

Hi @dbeavon3

Thank you for being willing to share your findings!

Regards,

Nono Chen

Parameterization of environment when running notebook activity

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Parameterization of environment when running notebook activity

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025