<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: User Data Functions and Spark in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/User-Data-Functions-and-Spark/m-p/4725159#M10011</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1228334"&gt;@Bharath_Kumar_S&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for reaching out to Microsoft Fabric Community.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Spark is not supported inside User Data Functions, even though you may see pyspark in the library section. UDF’s run in a restricted python environment that does not include a Spark runtime, which is why you are getting the java gateway error.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Use a Notebook step or Data Pipeline in the same Translytical Task Flow to refresh the Delta table. For example like below:&lt;BR /&gt;df = spark.read.format("parquet").load("Files/&amp;lt;lakehouse_name&amp;gt;/file_path")&lt;BR /&gt;df.write.format("delta").mode("overwrite").save("Tables/&amp;lt;lakehouse_name&amp;gt;/delta_table")&lt;/LI&gt;
&lt;LI&gt;If your power bi report is connected to this table via Direct Lake mode, it will reflect the updates automatically no need of manual refresh.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this post&amp;nbsp;helps, then please consider&lt;SPAN&gt;&amp;nbsp;&lt;STRONG&gt;Accepting as solution&amp;nbsp;&lt;/STRONG&gt;to help the other members find it more quickly,&amp;nbsp;don't forget to give a&amp;nbsp;&lt;STRONG&gt;"Kudos"&lt;/STRONG&gt;&amp;nbsp;– I’d truly appreciate it!&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks and regards,&lt;/P&gt;
&lt;P&gt;Anjan Kumar Chippa&lt;/P&gt;</description>
    <pubDate>Mon, 09 Jun 2025 07:02:25 GMT</pubDate>
    <dc:creator>v-achippa</dc:creator>
    <dc:date>2025-06-09T07:02:25Z</dc:date>
    <item>
      <title>User Data Functions and Spark</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/User-Data-Functions-and-Spark/m-p/4722627#M9959</link>
      <description>&lt;P&gt;Use case: use Translytical Task Flows to update a file in lakehouse and refresh a delta table based on that file on which power BI report is built.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wanted to refresh the delta table after updating the file using spark, but if I include any spark code I am getting the below invocation error.&amp;nbsp;&lt;BR /&gt;{&lt;BR /&gt;"functionName": "test_spark_basic",&lt;BR /&gt;"invocationId": "xxxxxxx-b9bb-422e-8a80-xxxxxxxxx",&lt;BR /&gt;"status": "Failed",&lt;BR /&gt;"output": "",&lt;BR /&gt;"errors": [&lt;BR /&gt;{&lt;BR /&gt;"errorCode": "InternalError",&lt;BR /&gt;"message": "An internal execution error occured during function execution",&lt;BR /&gt;"properties": {&lt;BR /&gt;"error_type": "PySparkRuntimeError",&lt;BR /&gt;"error_message": "Java gateway process exited before sending its port number."&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;code I used for spark testing:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; fabric.functions &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; fn&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; SparkSession&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;udf = fn.UserDataFunctions()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;@udf&lt;/SPAN&gt;&lt;SPAN&gt;.function()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt;&lt;SPAN&gt; test_spark_basic() -&amp;gt; &lt;/SPAN&gt;&lt;SPAN&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# Create Spark session&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; spark = SparkSession.builder.appName(&lt;/SPAN&gt;&lt;SPAN&gt;"TestSparkInUDF"&lt;/SPAN&gt;&lt;SPAN&gt;).getOrCreate()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# Create sample Spark DataFrame&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data = [(&lt;/SPAN&gt;&lt;SPAN&gt;"Bharath"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;25&lt;/SPAN&gt;&lt;SPAN&gt;), (&lt;/SPAN&gt;&lt;SPAN&gt;"Anita"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;30&lt;/SPAN&gt;&lt;SPAN&gt;)]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; columns = [&lt;/SPAN&gt;&lt;SPAN&gt;"Name"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"Age"&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; df = spark.createDataFrame(data, columns)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;# Collect result and convert to string&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; result = df.collect()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;"\n"&lt;/SPAN&gt;&lt;SPAN&gt;.join([f&lt;/SPAN&gt;&lt;SPAN&gt;"{row['Name']}, {row['Age']}"&lt;/SPAN&gt; &lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; row &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt;&lt;SPAN&gt; result])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;questions:&lt;/P&gt;&lt;P&gt;1. can we use spark inside User Data Functions? If yes, pls provide a guide. (I can see pyspark module in library section)&lt;/P&gt;&lt;P&gt;2. Is there any other way to refresh the delta table after modification of file from UDF itself?&lt;/P&gt;</description>
      <pubDate>Fri, 06 Jun 2025 07:51:12 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/User-Data-Functions-and-Spark/m-p/4722627#M9959</guid>
      <dc:creator>Bharath_Kumar_S</dc:creator>
      <dc:date>2025-06-06T07:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: User Data Functions and Spark</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/User-Data-Functions-and-Spark/m-p/4725159#M10011</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1228334"&gt;@Bharath_Kumar_S&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for reaching out to Microsoft Fabric Community.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Spark is not supported inside User Data Functions, even though you may see pyspark in the library section. UDF’s run in a restricted python environment that does not include a Spark runtime, which is why you are getting the java gateway error.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Use a Notebook step or Data Pipeline in the same Translytical Task Flow to refresh the Delta table. For example like below:&lt;BR /&gt;df = spark.read.format("parquet").load("Files/&amp;lt;lakehouse_name&amp;gt;/file_path")&lt;BR /&gt;df.write.format("delta").mode("overwrite").save("Tables/&amp;lt;lakehouse_name&amp;gt;/delta_table")&lt;/LI&gt;
&lt;LI&gt;If your power bi report is connected to this table via Direct Lake mode, it will reflect the updates automatically no need of manual refresh.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this post&amp;nbsp;helps, then please consider&lt;SPAN&gt;&amp;nbsp;&lt;STRONG&gt;Accepting as solution&amp;nbsp;&lt;/STRONG&gt;to help the other members find it more quickly,&amp;nbsp;don't forget to give a&amp;nbsp;&lt;STRONG&gt;"Kudos"&lt;/STRONG&gt;&amp;nbsp;– I’d truly appreciate it!&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks and regards,&lt;/P&gt;
&lt;P&gt;Anjan Kumar Chippa&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jun 2025 07:02:25 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/User-Data-Functions-and-Spark/m-p/4725159#M10011</guid>
      <dc:creator>v-achippa</dc:creator>
      <dc:date>2025-06-09T07:02:25Z</dc:date>
    </item>
  </channel>
</rss>

