<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spark Job Definition: unable to access uploaded Reference File in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3613582#M1197</link>
    <description>&lt;P&gt;Hi,&lt;BR /&gt;I am having issues trying to use reference files in a spark job definition.&lt;BR /&gt;As shown in the picture I have a simple main definition file called spark_entry_job.py and under Reference File I uploaded several other python files.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="y0m0_0-1704296128640.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1019430i74445425EFCB7D84/image-size/medium?v=v2&amp;amp;px=400" role="button" title="y0m0_0-1704296128640.png" alt="y0m0_0-1704296128640.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From within the main definition file I am attempting to import and run the files I uploaded under Reference File.&lt;/P&gt;&lt;P&gt;The code looks something like this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession
import dim_businessunit
import dim_customer

def main()
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_businessunit.run(spark)
    dim_customer.run(spark)

if __name__ == "__main__":
    main()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;But I am getting a module not found error.&lt;BR /&gt;I tried different things to see If I could figure out where those Reference File are mounted, if anywhere, but I can't figure it out.&lt;BR /&gt;&lt;BR /&gt;How can I use those .py files that I uploaded?&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jan 2024 15:44:05 GMT</pubDate>
    <dc:creator>y0m0</dc:creator>
    <dc:date>2024-01-03T15:44:05Z</dc:date>
    <item>
      <title>Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3613582#M1197</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;I am having issues trying to use reference files in a spark job definition.&lt;BR /&gt;As shown in the picture I have a simple main definition file called spark_entry_job.py and under Reference File I uploaded several other python files.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="y0m0_0-1704296128640.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1019430i74445425EFCB7D84/image-size/medium?v=v2&amp;amp;px=400" role="button" title="y0m0_0-1704296128640.png" alt="y0m0_0-1704296128640.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From within the main definition file I am attempting to import and run the files I uploaded under Reference File.&lt;/P&gt;&lt;P&gt;The code looks something like this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession
import dim_businessunit
import dim_customer

def main()
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_businessunit.run(spark)
    dim_customer.run(spark)

if __name__ == "__main__":
    main()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;But I am getting a module not found error.&lt;BR /&gt;I tried different things to see If I could figure out where those Reference File are mounted, if anywhere, but I can't figure it out.&lt;BR /&gt;&lt;BR /&gt;How can I use those .py files that I uploaded?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jan 2024 15:44:05 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3613582#M1197</guid>
      <dc:creator>y0m0</dc:creator>
      <dc:date>2024-01-03T15:44:05Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3614825#M1198</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/650867"&gt;@y0m0&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Thanks for using Fabric Community.&lt;BR /&gt;&lt;BR /&gt;As I understand you are trying to use the function present in&amp;nbsp;dim_businessunit.py file.&lt;BR /&gt;&lt;BR /&gt;Can you please try to use below code in your main file and let me know if that worked?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;from file import function

function()&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 07:15:23 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3614825#M1198</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-01-04T07:15:23Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615241#M1199</link>
      <description>&lt;P&gt;Here are the files in questions where I tried to use the import syntax as suggested to no avail.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;spark_entry_job.py&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pyspark.sql import SparkSession
from dim_businessunit import run as dim_bu_run
from dim_customer import run as dim_customer_run

def main():
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_bu_run(spark)
    dim_customer_run(spark)

if __name__ == "__main__":
    main()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;dim_businessunit.py&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def run(spark):
	print("log from dim_businessunit")&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;dim_customer.py&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def run(spark):
	print("log from dim_customer")&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;and here is the same no module named xxxx error:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;024-01-04 10:07:03,940 ERROR ApplicationMaster [Driver]: User application exited with status 1, error msg: Traceback (most recent call last):
  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in &amp;lt;module&amp;gt;
    from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
2024-01-04 10:07:03,947 ERROR ApplicationMaster [main]: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:525)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:284)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:967)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:966)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:966)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.PySparkUserAppException: User application exited with 1 : Traceback (most recent call last):
  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in &amp;lt;module&amp;gt;
    from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
	at org.apache.spark.deploy.PythonRunner$.runPythonProcess(PythonRunner.scala:124)
	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
	at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:757)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 10:11:01 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615241#M1199</guid>
      <dc:creator>y0m0</dc:creator>
      <dc:date>2024-01-04T10:11:01Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615318#M1200</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/650867"&gt;@y0m0&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Apologies for the issue you are facing. When I tried to reproduce your scenario I am able to run the job successfully without any issues. Attaching the screenshot for reference,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vgchennamsft_1-1704365324167.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1019876iF1894DCC9D7E27B8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vgchennamsft_1-1704365324167.png" alt="vgchennamsft_1-1704365324167.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vgchennamsft_0-1704365224780.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1019874iE724BA1B7646FDA6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vgchennamsft_0-1704365224780.png" alt="vgchennamsft_0-1704365224780.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;FYI: I have used the same code that you shared previously.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Incase if you are still facing the issue this might require a deeper investigation from our engineering team and they can guide you better.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Please go ahead and raise a support ticket to reach our support team:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://support.fabric.microsoft.com/support" target="_blank" rel="noopener nofollow noreferrer"&gt;https://support.fabric.microsoft.com/support&lt;/A&gt;&lt;BR /&gt;Please provide the ticket number here as we can keep an eye on it.&lt;BR /&gt;&lt;BR /&gt;Hope this is helpful. Please let me know incase of further queries.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 10:51:00 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615318#M1200</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-01-04T10:51:00Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615344#M1201</link>
      <description>&lt;P&gt;@Anonymous&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;thanks for the quick reply.&lt;BR /&gt;I am not sure what the issue is, with the code provided I get the no module xxx error.&lt;BR /&gt;&lt;BR /&gt;I noticed in your SJD repro that you have a notification about updating to Runtime 1.2, does that mean that the workspace you used to repro is using Runtime 1.1?&lt;BR /&gt;I am currently testing on Runtime 1.2 on the Microsoft Fabric trial, could this be the issue?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 11:10:12 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615344#M1201</guid>
      <dc:creator>y0m0</dc:creator>
      <dc:date>2024-01-04T11:10:12Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615364#M1202</link>
      <description>&lt;P&gt;I opened a ticket with support with case nr: 2401040050001801&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 11:28:05 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3615364#M1202</guid>
      <dc:creator>y0m0</dc:creator>
      <dc:date>2024-01-04T11:28:05Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3617149#M1203</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/650867"&gt;@y0m0&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.&lt;BR /&gt;&lt;BR /&gt;Sorry for the inconvenience..&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jan 2024 07:57:55 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3617149#M1203</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2024-01-05T07:57:55Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Job Definition: unable to access uploaded Reference File</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3662593#M1204</link>
      <description>&lt;P&gt;After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.&lt;BR /&gt;The engeneering team deployed a fix which I tested and everything now works as it should.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Jan 2024 11:07:01 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/Spark-Job-Definition-unable-to-access-uploaded-Reference-File/m-p/3662593#M1204</guid>
      <dc:creator>y0m0</dc:creator>
      <dc:date>2024-01-26T11:07:01Z</dc:date>
    </item>
  </channel>
</rss>

