Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
fAnissa
New Member

spark NameError when using an imported function needing SparkContext

Hello,

 

 

I have developed some functions in python, which I packaged as a .whl and then imported into my fabric environment.

The functions developed use 'spark' to effectuate read and write actions.

 

This is a simplified version of the function in the package:

import spark

def get_file(outputFilePath: str)
    df = spark.read.option("multiline", "true").json(outputFilePath)          
    return df

 

In the package, spark gets defined like this in a spark.py file

from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("GlobalSpark") \
        .master("local[*]") \
        .getOrCreate()

 

This code works locally as it notices there is no SparkSession, and therefore creates one.

 

However, when calling this function within a Fabric Notebook I get a 'NameError: name 'spark' is not defined'

 

As I will be executing this code with the Fabric Runtime I expected the SparkSession(=spark) to also be available within these functions. That is the reason why i did not pass spark explicitly in the function.

 

(I tried both import methods, either through the custom libraries, as well as direct installation via the built-in resources.)

 

I know one option is to refactor my whole codebase in order to pass spark explicitly. 

Before doing such things, I wanted to check whether this is the intended behavior, or whether I am configuring this wrong?

 

Kind regards,

Anissa

 

1 ACCEPTED SOLUTION
andrewsommer
Memorable Member
Memorable Member

When you develop locally, your spark.py file explicitly creates a SparkSession using SparkSession.builder. This works because you control the full Python environment and are expected to instantiate Spark manually.

However, in Microsoft Fabric Notebooks, a SparkSession is already created and provided implicitly by the runtime. You can access it simply via the spark object, but you should not re-instantiate or create a new SparkSession.


Don't define your own spark.py module

Rename your module to something else like spark_utils.py or file_io.py. This avoids shadowing the built-in spark object.

 

Use the implicit spark from Fabric

Remove the following from your code entirely:

CopyEdit
from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("GlobalSpark") \
        .master("local[*]") \
        .getOrCreate()

 

Instead, in your utility function, rely on the pre-provided spark:

CopyEdit
def get_file(outputFilePath: str):
    df = spark.read.option("multiline", "true").json(outputFilePath)
    return df


Avoid import spark altogether

If you must package utilities, structure it like this:

CopyEdit
# file_io.py
def get_file(outputFilePath: str):
    from pyspark.sql import SparkSession
    spark = SparkSession.getActiveSession()
    if spark is None:
        raise RuntimeError("No active SparkSession found. This function must be run within a Spark environment.")
    df = spark.read.option("multiline", "true").json(outputFilePath)
    return df

 

But in Fabric, SparkSession.getActiveSession() should return the running session just fine.

 

Please mark this post as solution if it helps you. Appreciate Kudos.

 

 

 

 

View solution in original post

3 REPLIES 3
v-prasare
Community Support
Community Support

Hi @fAnissa,

as we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?

 

 

 

 

Thanks,

Prashanth Are

MS Fabric community support

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

fAnissa
New Member

Hello Andrew,

 

Thank you for your reply.

 

Indeed, I performed the following actions :

- I removed the spark.py utility from my package

- I added this to my function :

spark = SparkSession.getActiveSession()

Once these 2 actions performed, Fabric recognizes spark and doesn't throw a NameError.

 

However...

If i don't explicitly use the 'getActiveSession' function, then the problem still persists.

How come we have no direct access to the SparkSession(spark) without defining it first?

 

Thanks so much for your help so far !

 

 

Kind regards,

Anissa

andrewsommer
Memorable Member
Memorable Member

When you develop locally, your spark.py file explicitly creates a SparkSession using SparkSession.builder. This works because you control the full Python environment and are expected to instantiate Spark manually.

However, in Microsoft Fabric Notebooks, a SparkSession is already created and provided implicitly by the runtime. You can access it simply via the spark object, but you should not re-instantiate or create a new SparkSession.


Don't define your own spark.py module

Rename your module to something else like spark_utils.py or file_io.py. This avoids shadowing the built-in spark object.

 

Use the implicit spark from Fabric

Remove the following from your code entirely:

CopyEdit
from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("GlobalSpark") \
        .master("local[*]") \
        .getOrCreate()

 

Instead, in your utility function, rely on the pre-provided spark:

CopyEdit
def get_file(outputFilePath: str):
    df = spark.read.option("multiline", "true").json(outputFilePath)
    return df


Avoid import spark altogether

If you must package utilities, structure it like this:

CopyEdit
# file_io.py
def get_file(outputFilePath: str):
    from pyspark.sql import SparkSession
    spark = SparkSession.getActiveSession()
    if spark is None:
        raise RuntimeError("No active SparkSession found. This function must be run within a Spark environment.")
    df = spark.read.option("multiline", "true").json(outputFilePath)
    return df

 

But in Fabric, SparkSession.getActiveSession() should return the running session just fine.

 

Please mark this post as solution if it helps you. Appreciate Kudos.

 

 

 

 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.