User Data Functions and Spark

Bharath_Kumar_S — Fri, 06 Jun 2025 07:51:12 GMT

Use case: use Translytical Task Flows to update a file in lakehouse and refresh a delta table based on that file on which power BI report is built.

I wanted to refresh the delta table after updating the file using spark, but if I include any spark code I am getting the below invocation error.
{
"functionName": "test_spark_basic",
"invocationId": "xxxxxxx-b9bb-422e-8a80-xxxxxxxxx",
"status": "Failed",
"output": "",
"errors": [
{
"errorCode": "InternalError",
"message": "An internal execution error occured during function execution",
"properties": {
"error_type": "PySparkRuntimeError",
"error_message": "Java gateway process exited before sending its port number."
}
}
]
}
code I used for spark testing:

import fabric.functions as fn

from pyspark.sql import SparkSession

udf = fn.UserDataFunctions()

@udf.function()

def test_spark_basic() -> str:

# Create Spark session

spark = SparkSession.builder.appName("TestSparkInUDF").getOrCreate()

# Create sample Spark DataFrame

data = [("Bharath", 25), ("Anita", 30)]

columns = ["Name", "Age"]

df = spark.createDataFrame(data, columns)

# Collect result and convert to string

result = df.collect()

return "\n".join([f"{row['Name']}, {row['Age']}" for row in result])

questions:

1. can we use spark inside User Data Functions? If yes, pls provide a guide. (I can see pyspark module in library section)

2. Is there any other way to refresh the delta table after modification of file from UDF itself?

Re: User Data Functions and Spark

v-achippa — Mon, 09 Jun 2025 07:02:25 GMT

Hi @Bharath_Kumar_S,

Thank you for reaching out to Microsoft Fabric Community.

Spark is not supported inside User Data Functions, even though you may see pyspark in the library section. UDF’s run in a restricted python environment that does not include a Spark runtime, which is why you are getting the java gateway error.

Use a Notebook step or Data Pipeline in the same Translytical Task Flow to refresh the Delta table. For example like below:
df = spark.read.format("parquet").load("Files/<lakehouse_name>/file_path")
df.write.format("delta").mode("overwrite").save("Tables/<lakehouse_name>/delta_table")
If your power bi report is connected to this table via Direct Lake mode, it will reflect the updates automatically no need of manual refresh.

If this post helps, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it!

Thanks and regards,

Anjan Kumar Chippa

topic Re: User Data Functions and Spark in Data Engineering

User Data Functions and Spark

Re: User Data Functions and Spark