Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
y0m0
Frequent Visitor

Spark Job Definition: unable to access uploaded Reference File

Hi,
I am having issues trying to use reference files in a spark job definition.
As shown in the picture I have a simple main definition file called spark_entry_job.py and under Reference File I uploaded several other python files.

y0m0_0-1704296128640.png

 

From within the main definition file I am attempting to import and run the files I uploaded under Reference File.

The code looks something like this:

from pyspark.sql import SparkSession
import dim_businessunit
import dim_customer

def main()
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_businessunit.run(spark)
    dim_customer.run(spark)

if __name__ == "__main__":
    main()

 
But I am getting a module not found error.
I tried different things to see If I could figure out where those Reference File are mounted, if anywhere, but I can't figure it out.

How can I use those .py files that I uploaded?

2 ACCEPTED SOLUTIONS

Hi @y0m0 ,

Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.

Sorry for the inconvenience..

View solution in original post

After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.
The engeneering team deployed a fix which I tested and everything now works as it should.

View solution in original post

7 REPLIES 7
v-gchenna-msft
Community Support
Community Support

Hi @y0m0 ,

Thanks for using Fabric Community.

As I understand you are trying to use the function present in dim_businessunit.py file.

Can you please try to use below code in your main file and let me know if that worked?

 

from file import function

function()

 

 

Here are the files in questions where I tried to use the import syntax as suggested to no avail.

 

spark_entry_job.py

from pyspark.sql import SparkSession
from dim_businessunit import run as dim_bu_run
from dim_customer import run as dim_customer_run

def main():
    spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
    dim_bu_run(spark)
    dim_customer_run(spark)

if __name__ == "__main__":
    main()

 

 

dim_businessunit.py

def run(spark):
	print("log from dim_businessunit")



dim_customer.py

def run(spark):
	print("log from dim_customer")



 and here is the same no module named xxxx error:

024-01-04 10:07:03,940 ERROR ApplicationMaster [Driver]: User application exited with status 1, error msg: Traceback (most recent call last):
  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in <module>
    from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
2024-01-04 10:07:03,947 ERROR ApplicationMaster [main]: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:525)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:284)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:967)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:966)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:966)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.PySparkUserAppException: User application exited with 1 : Traceback (most recent call last):
  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in <module>
    from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
	at org.apache.spark.deploy.PythonRunner$.runPythonProcess(PythonRunner.scala:124)
	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
	at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:757)

 

Hi @y0m0 ,

Apologies for the issue you are facing. When I tried to reproduce your scenario I am able to run the job successfully without any issues. Attaching the screenshot for reference,

vgchennamsft_1-1704365324167.png

 



vgchennamsft_0-1704365224780.png


FYI: I have used the same code that you shared previously.

Incase if you are still facing the issue this might require a deeper investigation from our engineering team and they can guide you better.

Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support
Please provide the ticket number here as we can keep an eye on it.

Hope this is helpful. Please let me know incase of further queries.



I opened a ticket with support with case nr: 2401040050001801

Hi @y0m0 ,

Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.

Sorry for the inconvenience..

After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.
The engeneering team deployed a fix which I tested and everything now works as it should.

@v-gchenna-msft 
thanks for the quick reply.
I am not sure what the issue is, with the code provided I get the no module xxx error.

I noticed in your SJD repro that you have a notification about updating to Runtime 1.2, does that mean that the workspace you used to repro is using Runtime 1.1?
I am currently testing on Runtime 1.2 on the Microsoft Fabric trial, could this be the issue?

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.