Solved: spark NameError when using an imported function ne...

fAnissa · ‎04-17-2025

Hello,

I have developed some functions in python, which I packaged as a .whl and then imported into my fabric environment.

The functions developed use 'spark' to effectuate read and write actions.

This is a simplified version of the function in the package:

import spark

def get_file(outputFilePath: str)
    df = spark.read.option("multiline", "true").json(outputFilePath)          
    return df

In the package, spark gets defined like this in a spark.py file

from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("GlobalSpark") \
        .master("local[*]") \
        .getOrCreate()

This code works locally as it notices there is no SparkSession, and therefore creates one.

However, when calling this function within a Fabric Notebook I get a 'NameError: name 'spark' is not defined'

As I will be executing this code with the Fabric Runtime I expected the SparkSession(=spark) to also be available within these functions. That is the reason why i did not pass spark explicitly in the function.

(I tried both import methods, either through the custom libraries, as well as direct installation via the built-in resources.)

I know one option is to refactor my whole codebase in order to pass spark explicitly.

Before doing such things, I wanted to check whether this is the intended behavior, or whether I am configuring this wrong?

Kind regards,

Anissa

andrewsommer · ‎04-17-2025

When you develop locally, your spark.py file explicitly creates a SparkSession using SparkSession.builder. This works because you control the full Python environment and are expected to instantiate Spark manually.

However, in Microsoft Fabric Notebooks, a SparkSession is already created and provided implicitly by the runtime. You can access it simply via the spark object, but you should not re-instantiate or create a new SparkSession.

Don't define your own spark.py module

Rename your module to something else like spark_utils.py or file_io.py. This avoids shadowing the built-in spark object.

Use the implicit spark from Fabric

Remove the following from your code entirely:

CopyEdit
from pyspark.sql import SparkSession

spark = SparkSession.builder \
        .appName("GlobalSpark") \
        .master("local[*]") \
        .getOrCreate()

Instead, in your utility function, rely on the pre-provided spark:

CopyEdit
def get_file(outputFilePath: str):
    df = spark.read.option("multiline", "true").json(outputFilePath)
    return df

Avoid import spark altogether

If you must package utilities, structure it like this:

CopyEdit
# file_io.py
def get_file(outputFilePath: str):
    from pyspark.sql import SparkSession
    spark = SparkSession.getActiveSession()
    if spark is None:
        raise RuntimeError("No active SparkSession found. This function must be run within a Spark environment.")
    df = spark.read.option("multiline", "true").json(outputFilePath)
    return df

But in Fabric, SparkSession.getActiveSession() should return the running session just fine.

Please mark this post as solution if it helps you. Appreciate Kudos.

View solution in original post

v-prasare · ‎04-22-2025

Hi @fAnissa,

as we haven’t heard back from you, we wanted to kindly follow up to check if the solution provided for your issue worked? or let us know if you need any further assistance here?

Thanks,

Prashanth Are

MS Fabric community support

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly and give Kudos if helped you resolve your query

fAnissa · ‎04-17-2025

Hello Andrew,

Thank you for your reply.

Indeed, I performed the following actions :

- I removed the spark.py utility from my package

- I added this to my function :

spark = SparkSession.getActiveSession()

Once these 2 actions performed, Fabric recognizes spark and doesn't throw a NameError.

However...

If i don't explicitly use the 'getActiveSession' function, then the problem still persists.

How come we have no direct access to the SparkSession(spark) without defining it first?

Thanks so much for your help so far !