Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
y0m0
Frequent Visitor

Spark Job Definition: unable to access uploaded Reference File

Hi,
I am having issues trying to use reference files in a spark job definition.
As shown in the picture I have a simple main definition file called spark_entry_job.py and under Reference File I uploaded several other python files.

y0m0_0-1704296128640.png

 

From within the main definition file I am attempting to import and run the files I uploaded under Reference File.

The code looks something like this:

from pyspark.sql import SparkSession
import dim_businessunit
import dim_customer

def main()
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_businessunit.run(spark)
    dim_customer.run(spark)

if __name__ == "__main__":
    main()

 
But I am getting a module not found error.
I tried different things to see If I could figure out where those Reference File are mounted, if anywhere, but I can't figure it out.

How can I use those .py files that I uploaded?

2 ACCEPTED SOLUTIONS
Anonymous
Not applicable

Hi @y0m0 ,

Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.

Sorry for the inconvenience..

View solution in original post

After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.
The engeneering team deployed a fix which I tested and everything now works as it should.

View solution in original post

7 REPLIES 7
Anonymous
Not applicable

Hi @y0m0 ,

Thanks for using Fabric Community.

As I understand you are trying to use the function present in dim_businessunit.py file.

Can you please try to use below code in your main file and let me know if that worked?

 

from file import function

function()

 

 

Here are the files in questions where I tried to use the import syntax as suggested to no avail.

 

spark_entry_job.py

from pyspark.sql import SparkSession
from dim_businessunit import run as dim_bu_run
from dim_customer import run as dim_customer_run

def main():
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_bu_run(spark)
    dim_customer_run(spark)

if __name__ == "__main__":
    main()

 

 

dim_businessunit.py

def run(spark):
	print("log from dim_businessunit")



dim_customer.py

def run(spark):
	print("log from dim_customer")



 and here is the same no module named xxxx error:

024-01-04 10:07:03,940 ERROR ApplicationMaster [Driver]: User application exited with status 1, error msg: Traceback (most recent call last):
  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in <module>
    from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
2024-01-04 10:07:03,947 ERROR ApplicationMaster [main]: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:525)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:284)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:967)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:966)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:966)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.PySparkUserAppException: User application exited with 1 : Traceback (most recent call last):
  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in <module>
    from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
	at org.apache.spark.deploy.PythonRunner$.runPythonProcess(PythonRunner.scala:124)
	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
	at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:757)

 

Anonymous
Not applicable

Hi @y0m0 ,

Apologies for the issue you are facing. When I tried to reproduce your scenario I am able to run the job successfully without any issues. Attaching the screenshot for reference,

vgchennamsft_1-1704365324167.png

 



vgchennamsft_0-1704365224780.png


FYI: I have used the same code that you shared previously.

Incase if you are still facing the issue this might require a deeper investigation from our engineering team and they can guide you better.

Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support
Please provide the ticket number here as we can keep an eye on it.

Hope this is helpful. Please let me know incase of further queries.



I opened a ticket with support with case nr: 2401040050001801

Anonymous
Not applicable

Hi @y0m0 ,

Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.

Sorry for the inconvenience..

After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.
The engeneering team deployed a fix which I tested and everything now works as it should.

@Anonymous 
thanks for the quick reply.
I am not sure what the issue is, with the code provided I get the no module xxx error.

I noticed in your SJD repro that you have a notification about updating to Runtime 1.2, does that mean that the workspace you used to repro is using Runtime 1.1?
I am currently testing on Runtime 1.2 on the Microsoft Fabric trial, could this be the issue?

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

November Update

Fabric Monthly Update - November 2024

Check out the November 2024 Fabric update to learn about new features.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.