Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save โฌ200 with code FABCOMM. Get registered
My database has timestamps before 1900, so I have to use a few spark configuration settings to get around the ancient datetime errors in notebooks. The problem is, setting those properties in the Spark Environment used by the notebook don't seem to apply when I run it from a Pipeline Notebook activity. See the spark settings and the output of the conf settings at runtime below:
Spark Environment properties I'm using (it is also set as the Default Environment):
spark.conf output when I run the notebook manually (correct):
spark.conf output when the notebook is run by a pipeline activity (defaulted, incorrect):
The pipeline activity running this notebook (I am using high concurrency):
I checked the pipeline activity and couldn't find any environment settings, so I assume it is supposed to use the one set on the notebook. Can someone help me out and explain why these settings aren't carrying over when I run the notebook from a Pipeline? How do I ensure these properties are set without explicitly setting them at the top of my notebook every time?
Thanks.
Solved! Go to Solution.
Hi @jisaac
unlike Databricks, Microsoft Fabric doesn't currently support initialization scripts for Spark environments. In Databricks, these scripts let you apply settings automatically when a cluster starts, but Fabric doesnโt offer that feature yet.
you can manually set the Spark configs at the top of each notebook to ensure the necessary settings (like handling pre-1900 timestamps) are always applied. Alternatively, if you disable High Concurrency mode in pipeline runs, each notebook will start a fresh Spark session and correctly pick up the environment settings.
Another option is to create a small helper notebook that contains all your spark.conf.set(...) lines and use %run at the top of your main notebooks. This way, you keep things consistent without repeating code everywhere.
If the issue still persists we recommend you to raise a support ticket.
You can submit a ticket through the Microsoft Power BI Support Portal:
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn
Thank you.
Hi @jisaac
I wanted to check if you had the opportunity to review the information provided. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.
Hi @jisaac
Thank's for clarification jisaac.
The reason your Spark Environment properties are not applying when running the notebook from a pipeline is due to High Concurrency mode. In this mode, notebooks often reuse existing Spark sessions, and once a session starts without certain configurations, it continues using those initial settings even for subsequent notebook executions. This prevents the Spark Properties set in the Fabric UI from being applied dynamically during pipeline runs.
To resolve this:
By following these steps, your Spark Environment settings will be consistently applied when executing notebooks through pipelines. Please let me know if you need further assistance.
If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community.
Thank you.
Can you describe what you meant by point 3, initialization scripts in the Spark Environment? Searching the internet I found no mention of this for Fabric. There is a mention of something similar for DataBricks, but would that apply to Fabric?
Hi @jisaac
unlike Databricks, Microsoft Fabric doesn't currently support initialization scripts for Spark environments. In Databricks, these scripts let you apply settings automatically when a cluster starts, but Fabric doesnโt offer that feature yet.
you can manually set the Spark configs at the top of each notebook to ensure the necessary settings (like handling pre-1900 timestamps) are always applied. Alternatively, if you disable High Concurrency mode in pipeline runs, each notebook will start a fresh Spark session and correctly pick up the environment settings.
Another option is to create a small helper notebook that contains all your spark.conf.set(...) lines and use %run at the top of your main notebooks. This way, you keep things consistent without repeating code everywhere.
If the issue still persists we recommend you to raise a support ticket.
You can submit a ticket through the Microsoft Power BI Support Portal:
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn
Thank you.
Hi @jisaac
Could you please let us know if your issue has been resolved? Did the @nilendraFabric post answer your query? Did you find any answer to this question? If so, please post that answer in the community so that it might help other members with similar issues to solve them faster. If this issue still persists, feel free to reach out to us.
Thank you.
The question in the original post is not solved, no. A workaround is not a solution. The question is how do I run a notebook from a pipeline and apply a Spark Environment (The "Environment" item in Fabric where you can add libraries and change spark properties) to it?
Hi @jisaac
Hi
I hope this information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If @nilendraFabric answered your question, please Accept it as a solution and give it a 'Kudos' so others can find it easily.
Thank you.
I'm sorry, what?
Hi @jisaac
I wanted to check if you had the opportunity to review the information provided @nilendraFabric . Please feel free to reach us if you have any further questions. If his response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.
Hi @jisaac
Thank you for reaching out microsoft fabric community forum.
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
Add a cell with the `%%configure` magic command as the very first cell in your notebook. This instructs the system to restart the Spark session with the desired settings so that properties like the ones handling ancient timestamps are properly applied.
%%configure
{
"conf": {
"spark.sql.legacy.timeParserPolicy": "LEGACY",
"spark.sql.parquet.int96RebaseModeInRead": "LEGACY",
"spark.sql.parquet.int96RebaseModeInWrite": "LEGACY"
}
}
if this is helpful please accept the answer
Thanks for your reply, but this is what I am hoping to avoid.
The last line in my post says
> How do I ensure these properties are set without explicitly setting them at the top of my notebook every time?
I was not aware of the %%configure magic, but I was setting these configs at the top of my notebook already. I'm wanting to know why the "Spark Properties" tab exists if it does not apply to sessions run by pipelines.
Could you please check If high concurrency mode for pipelines is enabled in the workspace settings
So this is why spark properties are not attached while running through pipelines
When you run a notebook interactively, a new Spark session is typically created for that notebook. During this process, the Spark Properties defined in the Spark Properties tab (or Default Spark Environment) are applied to the session at initialization. This ensures that any custom configurations you set are available for your interactive session.
In high concurrency mode, notebooks executed via pipelines often share an existing Spark session rather than creating a new one. If a session is already running, it will use the configurations it was initialized with, and the Spark Properties from the environment will not be reapplied. This is an optimization to avoid the overhead of starting new sessions for every pipeline activity.
Please accept the answer if this is helpful
Two things that I observe that are inconsistent with what you've said and confuse me:
- Whether the session starts from the first or second notebook run, it should still be using the same default properties, no? Both notebooks are set to run with the same environment. Are you saying that high concurrency sessions always start with no environment or properties?
- The settings are clearly not carrying over from one notebook to another. Observe the screen shots of two notebooks run under one HC session.
First notebook run:
Second notebook run, settings are not changed from previous run:
The settings reverted in the second notebook, so they are clearly being reset between notebooks.
Thanks for coming back.
so you have disabled the high concurrency mode. Still properties are not set in Notebook through pipeline.
The point I was trying to make through high concurrency was that if the first session started without those properties set, it will remain like that even if you change the spark initial properties in UI.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
9 | |
4 | |
3 | |
3 | |
2 |