Solved: Spark Environment Properties not applied during ru...

jisaac · ‎02-14-2025

My database has timestamps before 1900, so I have to use a few spark configuration settings to get around the ancient datetime errors in notebooks. The problem is, setting those properties in the Spark Environment used by the notebook don't seem to apply when I run it from a Pipeline Notebook activity. See the spark settings and the output of the conf settings at runtime below:

Spark Environment properties I'm using (it is also set as the Default Environment):

spark.conf output when I run the notebook manually (correct):

spark.conf output when the notebook is run by a pipeline activity (defaulted, incorrect):

The pipeline activity running this notebook (I am using high concurrency):

I checked the pipeline activity and couldn't find any environment settings, so I assume it is supposed to use the one set on the notebook. Can someone help me out and explain why these settings aren't carrying over when I run the notebook from a Pipeline? How do I ensure these properties are set without explicitly setting them at the top of my notebook every time?

Thanks.

Anonymous · ‎06-10-2025

Hi @jisaac
unlike Databricks, Microsoft Fabric doesn't currently support initialization scripts for Spark environments. In Databricks, these scripts let you apply settings automatically when a cluster starts, but Fabric doesn’t offer that feature yet.

you can manually set the Spark configs at the top of each notebook to ensure the necessary settings (like handling pre-1900 timestamps) are always applied. Alternatively, if you disable High Concurrency mode in pipeline runs, each notebook will start a fresh Spark session and correctly pick up the environment settings.

Another option is to create a small helper notebook that contains all your spark.conf.set(...) lines and use %run at the top of your main notebooks. This way, you keep things consistent without repeating code everywhere.

If the issue still persists we recommend you to raise a support ticket.

You can submit a ticket through the Microsoft Power BI Support Portal:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

Thank you.

View solution in original post

Anonymous · ‎03-18-2025

Hi @jisaac
I wanted to check if you had the opportunity to review the information provided. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.

Anonymous · ‎03-11-2025

Hi @jisaac
Thank's for clarification jisaac.

The reason your Spark Environment properties are not applying when running the notebook from a pipeline is due to High Concurrency mode. In this mode, notebooks often reuse existing Spark sessions, and once a session starts without certain configurations, it continues using those initial settings even for subsequent notebook executions. This prevents the Spark Properties set in the Fabric UI from being applied dynamically during pipeline runs.

To resolve this:

Disable High Concurrency mode for pipeline executions. This ensures that every pipeline-triggered notebook starts a new Spark session, correctly applying the Spark Environment settings.
If High Concurrency mode must remain enabled, you’ll need to manually enforce these properties by explicitly setting them at the start of each notebook.
Another alternative is to configure an initialization script in the Spark Environment, which ensures that the required properties are always applied when a session starts.

By following these steps, your Spark Environment settings will be consistently applied when executing notebooks through pipelines. Please let me know if you need further assistance.

If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community.

Thank you.

jisaac · ‎03-19-2025

Can you describe what you meant by point 3, initialization scripts in the Spark Environment? Searching the internet I found no mention of this for Fabric. There is a mention of something similar for DataBricks, but would that apply to Fabric?

Anonymous · ‎06-10-2025

Hi @jisaac
unlike Databricks, Microsoft Fabric doesn't currently support initialization scripts for Spark environments. In Databricks, these scripts let you apply settings automatically when a cluster starts, but Fabric doesn’t offer that feature yet.

you can manually set the Spark configs at the top of each notebook to ensure the necessary settings (like handling pre-1900 timestamps) are always applied. Alternatively, if you disable High Concurrency mode in pipeline runs, each notebook will start a fresh Spark session and correctly pick up the environment settings.

Another option is to create a small helper notebook that contains all your spark.conf.set(...) lines and use %run at the top of your main notebooks. This way, you keep things consistent without repeating code everywhere.

If the issue still persists we recommend you to raise a support ticket.

You can submit a ticket through the Microsoft Power BI Support Portal:

How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn

Thank you.

Anonymous · ‎03-07-2025

Hi @jisaac
Could you please let us know if your issue has been resolved? Did the @nilendraFabric post answer your query? Did you find any answer to this question? If so, please post that answer in the community so that it might help other members with similar issues to solve them faster. If this issue still persists, feel free to reach out to us.
Thank you.

jisaac · ‎03-07-2025

The question in the original post is not solved, no. A workaround is not a solution. The question is how do I run a notebook from a pipeline and apply a Spark Environment (The "Environment" item in Fabric where you can add libraries and change spark properties) to it?

Anonymous · ‎03-05-2025

Hi @jisaac
Hi
I hope this information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If @nilendraFabric answered your question, please Accept it as a solution and give it a 'Kudos' so others can find it easily.
Thank you.

jisaac · ‎03-06-2025

I'm sorry, what?

Anonymous · ‎02-25-2025

Hi @jisaac
I wanted to check if you had the opportunity to review the information provided @nilendraFabric . Please feel free to reach us if you have any further questions. If his response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.

Anonymous · ‎02-20-2025

Hi @jisaac
Thank you for reaching out microsoft fabric community forum.

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

nilendraFabric · ‎02-14-2025

@jisaac

Add a cell with the `%%configure` magic command as the very first cell in your notebook. This instructs the system to restart the Spark session with the desired settings so that properties like the ones handling ancient timestamps are properly applied.

%%configure
{
"conf": {
"spark.sql.legacy.timeParserPolicy": "LEGACY",
"spark.sql.parquet.int96RebaseModeInRead": "LEGACY",
"spark.sql.parquet.int96RebaseModeInWrite": "LEGACY"
}
}

if this is helpful please accept the answer

jisaac · ‎02-14-2025

Thanks for your reply, but this is what I am hoping to avoid.

The last line in my post says

> How do I ensure these properties are set without explicitly setting them at the top of my notebook every time?

I was not aware of the %%configure magic, but I was setting these configs at the top of my notebook already. I'm wanting to know why the "Spark Properties" tab exists if it does not apply to sessions run by pipelines.

nilendraFabric · ‎02-14-2025

Could you please check If high concurrency mode for pipelines is enabled in the workspace settings

jisaac · ‎02-14-2025

Yes, it is enabled. @nilendraFabric

nilendraFabric · ‎02-14-2025

@jisaac

So this is why spark properties are not attached while running through pipelines

When you run a notebook interactively, a new Spark session is typically created for that notebook. During this process, the Spark Properties defined in the Spark Properties tab (or Default Spark Environment) are applied to the session at initialization. This ensures that any custom configurations you set are available for your interactive session.

In high concurrency mode, notebooks executed via pipelines often share an existing Spark session rather than creating a new one. If a session is already running, it will use the configurations it was initialized with, and the Spark Properties from the environment will not be reapplied. This is an optimization to avoid the overhead of starting new sessions for every pipeline activity.

Please accept the answer if this is helpful

jisaac · ‎02-14-2025

@nilendraFabric

Two things that I observe that are inconsistent with what you've said and confuse me:

- Whether the session starts from the first or second notebook run, it should still be using the same default properties, no? Both notebooks are set to run with the same environment. Are you saying that high concurrency sessions always start with no environment or properties?

- The settings are clearly not carrying over from one notebook to another. Observe the screen shots of two notebooks run under one HC session.

First notebook run:

Second notebook run, settings are not changed from previous run:

The settings reverted in the second notebook, so they are clearly being reset between notebooks.

nilendraFabric · ‎02-14-2025

@jisaac

Thanks for coming back.

so you have disabled the high concurrency mode. Still properties are not set in Notebook through pipeline.

The point I was trying to make through high concurrency was that if the first session started without those properties set, it will remain like that even if you change the spark initial properties in UI.

Spark Environment Properties not applied during runtime from Pipeline Notebook Activity

Helpful resources

Fabric Community Update - August 2025

Huge last-minute discounts for FabCon Vienna from September 15-18, 2025

Spark Environment Properties not applied during runtime from Pipeline Notebook Activity

Helpful resources

Fabric Community Update - August 2025