Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hi,
I am having issues trying to use reference files in a spark job definition.
As shown in the picture I have a simple main definition file called spark_entry_job.py and under Reference File I uploaded several other python files.
From within the main definition file I am attempting to import and run the files I uploaded under Reference File.
The code looks something like this:
from pyspark.sql import SparkSession
import dim_businessunit
import dim_customer
def main()
spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
dim_businessunit.run(spark)
dim_customer.run(spark)
if __name__ == "__main__":
main()
But I am getting a module not found error.
I tried different things to see If I could figure out where those Reference File are mounted, if anywhere, but I can't figure it out.
How can I use those .py files that I uploaded?
Solved! Go to Solution.
Hi @y0m0 ,
Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.
Sorry for the inconvenience..
After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.
The engeneering team deployed a fix which I tested and everything now works as it should.
Hi @y0m0 ,
Thanks for using Fabric Community.
As I understand you are trying to use the function present in dim_businessunit.py file.
Can you please try to use below code in your main file and let me know if that worked?
from file import function
function()
Here are the files in questions where I tried to use the import syntax as suggested to no avail.
spark_entry_job.py
from pyspark.sql import SparkSession
from dim_businessunit import run as dim_bu_run
from dim_customer import run as dim_customer_run
def main():
spark = SparkSession.builder.appName("spark_entry_job").getOrCreate()
dim_bu_run(spark)
dim_customer_run(spark)
if __name__ == "__main__":
main()
dim_businessunit.py
def run(spark):
print("log from dim_businessunit")
dim_customer.py
def run(spark):
print("log from dim_customer")
and here is the same no module named xxxx error:
024-01-04 10:07:03,940 ERROR ApplicationMaster [Driver]: User application exited with status 1, error msg: Traceback (most recent call last):
File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in <module>
from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
2024-01-04 10:07:03,947 ERROR ApplicationMaster [main]: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:525)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:284)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:967)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:966)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:966)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.PySparkUserAppException: User application exited with 1 : Traceback (most recent call last):
File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1704362763961_0001/container_1704362763961_0001_01_000001/spark_entry_job.py", line 2, in <module>
from dim_businessunit import run as dim_bu_run
ModuleNotFoundError: No module named 'dim_businessunit'
at org.apache.spark.deploy.PythonRunner$.runPythonProcess(PythonRunner.scala:124)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:757)
Hi @y0m0 ,
Apologies for the issue you are facing. When I tried to reproduce your scenario I am able to run the job successfully without any issues. Attaching the screenshot for reference,
FYI: I have used the same code that you shared previously.
Incase if you are still facing the issue this might require a deeper investigation from our engineering team and they can guide you better.
Please go ahead and raise a support ticket to reach our support team:
https://support.fabric.microsoft.com/support
Please provide the ticket number here as we can keep an eye on it.
Hope this is helpful. Please let me know incase of further queries.
I opened a ticket with support with case nr: 2401040050001801
Hi @y0m0 ,
Thanks for sharing your support ticket number. In meanwhile I also reached out to internal for help and team confirmed it has a known issue and going to be resolved in upcoming release.
Sorry for the inconvenience..
After some back and forth with support, they figured out that the issue was happening in my specific Fabric region.
The engeneering team deployed a fix which I tested and everything now works as it should.
@Anonymous
thanks for the quick reply.
I am not sure what the issue is, with the code provided I get the no module xxx error.
I noticed in your SJD repro that you have a notification about updating to Runtime 1.2, does that mean that the workspace you used to repro is using Runtime 1.1?
I am currently testing on Runtime 1.2 on the Microsoft Fabric trial, could this be the issue?
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
Check out the November 2024 Fabric update to learn about new features.
User | Count |
---|---|
16 | |
12 | |
9 | |
8 | |
6 |