Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I retrieve data from JDE as the source, and the table contains date fields where the information is stored as Julian dates. Currently, there are 8 different sources, with one notebook for each source, all located in the same workspace. Therefore, the same function to convert Julian dates to dates is defined in all 8 notebooks. Since all notebooks use the same code, is it possible within the fabric framework to create a reusable component, like a user-defined function, that includes the transformation code from Julian to standard dates? This function could be called from all the notebooks, enhancing the efficiency of this process.
Thanks
Solved! Go to Solution.
HI @v-ssriganesh ,
As I explained in my previous response, based on Microsoft's response, the User data Functions cannot be used for transformations in the dataframe. Therefore, we need to utilize PySpark's native functions to transform data in the User data functions. So, I need to tweak my solution a bit and not use User data functions for Fabric but instead use pyspark.udf to do the transformation.
I think I know the way ahead now. Thanks for your support and help. We can close the ticket now.
Hi @v-ssriganesh ,
Thanks for your response. I created a User data function as suggested, however, It did not work. Here is the code and scenario:
User data function
Code in the notebook:
Instantiate the function
data_functions = notebookutils.udf.getFunctions('data_functions')
Test the function
data_functions.convert_julian_to_date('123241')
Output: '2023-08-29 00:00:00'
Call the function for a dataframe column, and it fails. Error: TypeError: Column is not iterable
df_silver = df_silver.withColumn('request_date', data_functions.convert_julian_to_date(df_silver['request_date']))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[46], line 1
----> 1 df_silver = df_silver.withColumn('request_date', data_functions.convert_julian_to_date(df_silver['request_date']))
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/notebookutils/mssparkutils/handlers/udfHandler.py:95, in UDF.__create_dynamic_function.<locals>.dynamic_function(*args, **kwargs)
93 workspace_id = self.__metadata.get("folderObjectId", "")
94 capacity_id = self.__metadata.get("capacityObjectId", "")
---> 95 result = self.__udf_handler.run(artifact_id, name, parameters, workspace_id, capacity_id)
96 if json.loads(result).get("status", "").lower() != "succeeded":
97 raise Exception(f"Function {name} failed with error: {result}")
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/notebookutils/mssparkutils/handlers/udfHandler.py:27, in UdfHandler.run(self, artifact_id, function_name, parameters, workspace_id, capacity_id)
24 if not workspace_id:
25 workspace_id = self.getCurrentWorkspaceId()
---> 27 return self.jvm.notebookutils.udf.run(artifact_id, function_name, parameters, workspace_id, capacity_id)
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1314, in JavaMember.__call__(self, *args)
1313 def __call__(self, *args):
-> 1314 args_command, temp_args = self._build_args(*args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1277, in JavaMember._build_args(self, *args)
1275 def _build_args(self, *args):
1276 if self.converters is not None and len(self.converters) > 0:
-> 1277 (new_args, temp_args) = self._get_args(args)
1278 else:
1279 new_args = args
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1264, in JavaMember._get_args(self, args)
1262 for converter in self.gateway_client.converters:
1263 if converter.can_convert(arg):
-> 1264 temp_arg = converter.convert(arg, self.gateway_client)
1265 temp_args.append(temp_arg)
1266 new_args.append(temp_arg)
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_collections.py:523, in MapConverter.convert(self, object, gateway_client)
521 java_map = HashMap()
522 for key in object.keys():
--> 523 java_map[key] = object[key]
524 return java_map
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_collections.py:82, in JavaMap.__setitem__(self, key, value)
81 def __setitem__(self, key, value):
---> 82 self.put(key, value)
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1314, in JavaMember.__call__(self, *args)
1313 def __call__(self, *args):
-> 1314 args_command, temp_args = self._build_args(*args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1277, in JavaMember._build_args(self, *args)
1275 def _build_args(self, *args):
1276 if self.converters is not None and len(self.converters) > 0:
-> 1277 (new_args, temp_args) = self._get_args(args)
1278 else:
1279 new_args = args
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1264, in JavaMember._get_args(self, args)
1262 for converter in self.gateway_client.converters:
1263 if converter.can_convert(arg):
-> 1264 temp_arg = converter.convert(arg, self.gateway_client)
1265 temp_args.append(temp_arg)
1266 new_args.append(temp_arg)
File ~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_collections.py:510, in ListConverter.convert(self, object, gateway_client)
508 ArrayList = JavaClass("java.util.ArrayList", gateway_client)
509 java_list = ArrayList()
--> 510 for element in object:
511 java_list.add(element)
512 return java_list
File /opt/spark/python/lib/pyspark.zip/pyspark/sql/column.py:710, in Column.__iter__(self)
709 def __iter__(self) -> None:
--> 710 raise TypeError("Column is not iterable")
TypeError: Column is not iterable
Hi @tinbaj,
Thank you for sharing the details and code. The error TypeError: Column is not iterable occurs because the User Data Function (UDF) is being applied to a Spark DataFrame column directly, which isn't compatible with the function's expectation of a single string input. To fix this, you need to register the UDF with Spark to handle DataFrame columns.
Here’s how to resolve it:
In your notebook, after instantiating the UDF, register it with Spark:
Update your notebook code as follows:
This ensures the UDF processes each row’s request_date column value correctly. Also, verify that the request_date column in df_silver contains valid Julian date strings (e.g: '123241'). If the column has mixed or invalid data types, you may need to preprocess it to ensure all values are strings.
If this helps, please mark it as “Accept as solution” and feel free to give a “Kudos” to help others in the community as well.
Thank you.
Hi @v-ssriganesh ,
Thanks for your response. I am getting this error when I am running the command to register the UDF: Register the UDF: spark.udf.register("convert_julian_to_date", data_functions.convert_julian_to_date)
--> 612 self.sparkSession._jsparkSession.udf().registerPython(name, register_udf._judf) 613 return return_udf File /opt/spark/python/lib/pyspark.zip/pyspark/sql/udf.py:321, in UserDefinedFunction._judf(self) 314 @property 315 def _judf(self) -> JavaObject: 316 # It is possible that concurrent access, to newly created UDF, 317 # will initialize multiple UserDefinedPythonFunctions. 318 # This is unlikely, doesn't affect correctness, 319 # and should have a minimal performance impact. 320 if self._judf_placeholder is None: --> 321 self._judf_placeholder = self._create_judf(self.func) 322 return self._judf_placeholder File /opt/spark/python/lib/pyspark.zip/pyspark/sql/udf.py:330, in UserDefinedFunction._create_judf(self, func) 327 spark = SparkSession._getActiveSessionOrCreate() 328 sc = spark.sparkContext --> 330 wrapped_func = _wrap_function(sc, func, self.returnType) 331 jdt = spark._jsparkSession.parseDataType(self.returnType.json()) 332 assert sc._jvm is not None File /opt/spark/python/lib/pyspark.zip/pyspark/sql/udf.py:59, in _wrap_function(sc, func, returnType) 57 else: 58 command = (func, returnType) ---> 59 pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command) 60 assert sc._jvm is not None 61 return sc._jvm.SimplePythonFunction( 62 bytearray(pickled_command), 63 env, (...) 68 sc._javaAccumulator, 69 ) File /opt/spark/python/lib/pyspark.zip/pyspark/rdd.py:5251, in _prepare_for_python_RDD(sc, command) 5248 def _prepare_for_python_RDD(sc: "SparkContext", command: Any) -> Tuple[bytes, Any, Any, Any]: 5249 # the serialized command will be compressed by broadcast 5250 ser = CloudPickleSerializer() -> 5251 pickled_command = ser.dumps(command) 5252 assert sc._jvm is not None 5253 if len(pickled_command) > sc._jvm.PythonUtils.getBroadcastThreshold(sc._jsc): # Default 1M 5254 # The broadcast will have same life cycle as created PythonRDD File /opt/spark/python/lib/pyspark.zip/pyspark/serializers.py:469, in CloudPickleSerializer.dumps(self, obj) 467 msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg) 468 print_exec(sys.stderr) --> 469 raise pickle.PicklingError(msg) PicklingError: Could not serialize object: PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER] It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Hi @tinbaj,
Thank you for providing the error details. The PicklingError: [CONTEXT_ONLY_VALID_ON_DRIVER] occurs because the User Data Function (UDF) is being serialized in a way that references the SparkContext, which isn't allowed in Spark's distributed environment. This is likely due to how the UDF is defined or accessed in your notebook.
To resolve this, try the following steps:
Instead of directly registering the UDF with spark.udf.register, use the Fabric UDF directly in the DataFrame operation, as Fabric’s UDFs are designed to work seamlessly with Spark. Update your notebook code as follows:
If the error persists, please share:
Please try these steps and let me know the outcome. If it resolves the issue, consider marking it as “Accept as solution” and giving a “Kudos” to help others in the community.
Thank you.
Hi @v-ssriganesh ,
Thanks for your response, but the suggested code did not fix the problem. I can confirm that I am using standard python libraries in UDF and does not use spark context.
The implementation as per the suggestion and the error message is as below:
Hello @tinbaj,
Thank you for the update and detailed feedback.
he PySparkTypeError: [NOT_ITERABLE] Column is not iterable error occurs because Fabric User Data Functions (UDFs) expect scalar inputs (e.g:L strings, integers), but df_silver.request_date is a Spark DataFrame column, which isn’t directly compatible. The documentation you referenced correctly notes that UDFs don’t accept column objects as inputs, which explains this error.
To resolve this, you need to register the UDF with Spark to process each row’s request_date value individually. Since you’ve confirmed the UDF works for a single input ('123241' returns '2023-08-29'), the issue is specific to DataFrame application. Here’s how to fix it:
Additionally, check the request_date column is a string type, as your UDF expects strings.
If this helps, please “Accept as solution” and give a “kudos” to assist other community members.
Thank you.
Hi @v-ssriganesh ,
Please see message 5 from this thread. We tried this a couple of days ago, and it doesn't work. When we try to register a User-Defined Function (UDF) as a UDF in Spark, it gives a Spark context error.
Does this mean that we cannot use User Data Functions for transformations in Dataframes?
Thanks
Hello @tinbaj,
Thank you for your patience and for providing detailed feedback.
We recommend raising a support ticket with Microsoft Fabric support for deeper investigation, as the issue may be specific to your workspace or the UDF’s interaction with your Spark environment. You can explain all the troubleshooting steps you have taken to help them better understand the issue.
You can create a Microsoft support ticket with the help of the link below:
https://learn.microsoft.com/en-us/power-bi/support/create-support-ticket
If this information is helpful, consider marking it as “Accept as solution” and giving a “Kudos” to help others in the community.
Thank you.
Hello @tinbaj,
Could you please confirm if the issue has been resolved after raising a support case? If a solution has been found, it would be greatly appreciated if you could share your insights with the community. This would be helpful for other members who may encounter similar issues.
Thank you for your understanding and assistance.
Hello @tinbaj,
We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?
If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.
Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.
Thank you for your understanding and participation.
Hi @v-ssriganesh ,
The ticket I raised with Microsoft did not provide any resolution to this issue. The associate classified this problem as more of a pyspark problem than a UDF issue. Please see below how the conversation ended with the Microsoft associate for this ticket.
As reported, you had a User Data Function (UDF) defined to convert a data from oracle database that is stored in Julian format to a data format.
As discussed, you created Fabric user data function “convert_julian_to_date”. When you used “notebookutils.udf” to get / invoke the function, it processed successfully. These show that Fabric user data function “convert_julian_to_date” itself is working without issues.
Just to clarify, Fabric User data functions uses “fabric.functions” library to provide the functionality. And like what you did, you can retrieve and invoke the function via “notebookutils.udf”. To my knowledge, Fabric User data functions (“fabric.functions” library) basically enables you to create user data functions in Python, not offering other methods (integrated with 3rd-parties like PySpark) by default.
Apache Spark DataFrames is 3rd-party and not supported by us – So I couldn’t provide the most accurate information for your other questions. I’d assume that you could call / invoke Fabric User data functions from Apache Spark DataFrames, but you might improperly use those PySpark APIs.
I did some research on “PySparkTypeError: [NOT_ITERABLE] Column is not iterable” – It’d be more about the DataFrame and/or the withColumn() usage.
For getting “PySparkRuntimeError: [CONTEXT_ONLY_VALID_ON_DRIVER] It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063” when using spark.udf.register("convert_julian_to_date", data_functions.convert_julian_to_date),
Unfortunately, all I could do was to check the Fabric user data function, and give my assumptions for those PySpark errors. Hope it would be helpful. To move forward, I’d suggest that:
Hello @tinbaj,
We appreciate your patience and sharing the update on the issue.
From what you've described, it looks like the Fabric User Data Function (UDF) itself is working as expected when used with notebookutils.udf. The issues seem to come up only when trying to use it inside PySpark operations like withColumn() or when attempting to register it with spark.udf.register.
Since PySpark is a third-party tool and isn't fully integrated with Fabric UDFs, this kind of limitation is expected for now. Currently, calling Fabric UDFs directly inside PySpark transformations or registering them as Spark SQL functions isn't supported.
If you still need to apply similar logic to your DataFrame, you might want to rewrite the function using a regular PySpark UDF (pyspark.sql.functions.udf() or pandas_udf) so it works smoothly within the PySpark context.
I totally understand this might not be the solution you were hoping for, but given the current capabilities of Fabric, using PySpark-native methods or checking with PySpark support channels would be the best way forward.
Thank you for your understanding.
Hello @tinbaj,
We are following up once again regarding your query. Could you please confirm if the issue has been resolved through the support ticket with Microsoft?
If the issue has been resolved, we kindly request you to share the resolution or key insights here to help others in the community. If we don’t hear back, we’ll go ahead and close this thread.
Should you need further assistance in the future, we encourage you to reach out via the Microsoft Fabric Community Forum and create a new thread. We’ll be happy to help.
Thank you for your understanding.
HI @v-ssriganesh ,
As I explained in my previous response, based on Microsoft's response, the User data Functions cannot be used for transformations in the dataframe. Therefore, we need to utilize PySpark's native functions to transform data in the User data functions. So, I need to tweak my solution a bit and not use User data functions for Fabric but instead use pyspark.udf to do the transformation.
I think I know the way ahead now. Thanks for your support and help. We can close the ticket now.
Hello @tinbaj,
Thank you for the update on the issue. Please continue to utilize the Microsoft Fabric Community Forum for further discussions and support.
Hello @tinbaj,
Thank you for reaching out with your query.
To streamline your Julian date-to-standard date conversion across all eight notebooks, I recommend using Fabric User Data Functions (UDFs). You can create a single UDF in your Fabric workspace to define the conversion logic, which can then be called from all notebooks. This eliminates code duplication, simplifies maintenance, and ensures consistency across your JDE data sources. Simply create a UDF item, define the conversion function, and invoke it in each notebook. For more details, check the Fabric User Data Functions documentation: Overview - Fabric User data functions (preview) - Microsoft Fabric | Microsoft Learn.
If this information is helpful, please “Accept as solution” and give a "kudos" to assist other community members in resolving similar issues more efficiently.
Thank you.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
6 | |
4 | |
4 | |
4 | |
3 |