March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hello,
I've been working on getting the spark job definition running locally. And although I've come along way, I'm not quite there yet.
I've set-up VSCode based on the following youtube video and documentation:
https://www.youtube.com/watch?v=A9SjAyZ_JSc
https://learn.microsoft.com/en-us/fabric/data-engineering/setup-vs-code-extension
https://learn.microsoft.com/en-us/fabric/data-engineering/author-sjd-with-vs-code
And one error I've got and was able to fix is the error:
[ERROR] 2024-07-26 22:06:57.430 [Thread-3] SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:56020
at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:36) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.executor.Executor.<init>(Executor.scala:250) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:235) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.SparkContext.<init>(SparkContext.scala:649) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_422]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_422]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_422]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_422]
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) ~[py4j-0.10.9.7.jar:?]
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) ~[py4j-0.10.9.7.jar:?]
at py4j.Gateway.invoke(Gateway.java:238) ~[py4j-0.10.9.7.jar:?]
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) ~[py4j-0.10.9.7.jar:?]
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) ~[py4j-0.10.9.7.jar:?]
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) ~[py4j-0.10.9.7.jar:?]
at py4j.ClientServerConnection.run(ClientServerConnection.java:106) ~[py4j-0.10.9.7.jar:?]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_422]
Can any one help me get this error fixed?
Kind regards,
Martijn
Solved! Go to Solution.
Hi @Broeks,
AFAIK, you still need to import the libraries even if they already exist on your local machine. In my opinion, I'd like to suggest you import used libraries on the notebook before execute the codes.
When you use the online version notebook with default Lakehouse, they will auto initialize some variables/configuration and libraries so that you can direct use them without import. (they did not works when you works on the local environment)
In addition, I'd like to suggest you check the document limitation part if they meet to your scenario:
VS Code extension overview - Microsoft Fabric | Microsoft Learn
The extension under the desktop mode doesn't support the Microsoft Spark Utilities yet
Regards,
Xiaoxin Sheng
I've started again from scratch and a thing that i noticed is that I need to install a spark version manually. Seems obvious, but the manual doesn't mention it, so this might be a good addition to the manual (VS Code extension overview - Microsoft Fabric | Microsoft Learn). So i've installed version 3.4.1 of spark to match with Runtime 1.2 (Apache Spark runtime in Fabric - Microsoft Fabric | Microsoft Learn)
Still the same error though.
I'm running on a trail with a capacity i don't hav access to:
Can this be releated to the errors i'm having. I'm able to run notebooks in the browser, but not in VSCode
Hi @Broeks,
AFAIK, you still need to import the libraries even if they already exist on your local machine. In my opinion, I'd like to suggest you import used libraries on the notebook before execute the codes.
When you use the online version notebook with default Lakehouse, they will auto initialize some variables/configuration and libraries so that you can direct use them without import. (they did not works when you works on the local environment)
In addition, I'd like to suggest you check the document limitation part if they meet to your scenario:
VS Code extension overview - Microsoft Fabric | Microsoft Learn
The extension under the desktop mode doesn't support the Microsoft Spark Utilities yet
Regards,
Xiaoxin Sheng
Thanks for your suggestion. I've got it running!
The things i've done to make it work:
- Run VSCode as admin: this enable VScode to install te required packages while building the environment
- Create a spark context with the config provided for the spark job definitions.
- Added the code:
conf.set("spark.driver.host", "localhost")
This resulted in the following code to create te spark context:
from pyspark.sql import SparkSession
from pyspark.conf import SparkConf
#Spark session builder
conf = SparkConf()
conf.set("spark.lighter.client.plugin", "org.apache.spark.lighter.DefaultLighterClientPlugin")
conf.set("spark.sql.catalogImplementation", "lighter")
conf.set("spark.lighter.sessionState.implementation", "org.apache.spark.sql.lighter.client.SparkLighterSessionStateBuilder")
conf.set("spark.lighter.externalCatalog.implementation", "org.apache.spark.sql.lighter.client.ConnectCatalogClient")
conf.set("spark.driver.host", "localhost")
spark = SparkSession.builder.config(conf=conf).getOrCreate()
My suggestion would be to add a code snippet to the built-in folder, similair to the spark job defintions.
Thanks for the support!
Hi @Broeks,
SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:56020
The error message seems related to spark context initialization failed and the spark URL invalid.
Have you checked if the required library installed correctly, and the ports are available which defined to install libraries?
Regards,
Xiaoxin Sheng
Thanks for your response. I'm running the notebook using the SynapseVSCode extension, which created two conda environments for me: fabric-synapse-runtime-1-1 & fabric-synapse-runtime-1-2.
I've checked the installed packages in this environment, which are:
(fabric-synapse-runtime-1-2) C:\Windows\System32>conda list
# packages in environment at C:\ProgramData\miniconda3\envs\fabric-synapse-runtime-1-2:
#
# Name Version Build Channel
asttokens 2.4.1 pypi_0 pypi
bzip2 1.0.8 h2bbff1b_6
ca-certificates 2024.7.2 haa95532_0
certifi 2024.7.4 pypi_0 pypi
cffi 1.16.0 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
comm 0.2.2 pypi_0 pypi
contourpy 1.2.1 pypi_0 pypi
cryptography 43.0.0 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
debugpy 1.8.2 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
exceptiongroup 1.2.2 pypi_0 pypi
executing 2.0.1 pypi_0 pypi
fonttools 4.53.1 pypi_0 pypi
idna 3.7 pypi_0 pypi
ipykernel 6.29.5 pypi_0 pypi
ipython 8.26.0 pypi_0 pypi
jedi 0.19.1 pypi_0 pypi
jupyter-client 8.6.2 pypi_0 pypi
jupyter-core 5.7.2 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
libffi 3.4.4 hd77b12b_1
matplotlib 3.9.1 pypi_0 pypi
matplotlib-inline 0.1.7 pypi_0 pypi
msal 1.30.0 pypi_0 pypi
nest-asyncio 1.6.0 pypi_0 pypi
numpy 2.0.1 pypi_0 pypi
openssl 3.0.14 h827c3e9_0
packaging 24.1 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
parso 0.8.4 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.0 py310haa95532_0
platformdirs 4.2.2 pypi_0 pypi
prompt-toolkit 3.0.47 pypi_0 pypi
psutil 6.0.0 pypi_0 pypi
pure-eval 0.2.3 pypi_0 pypi
py4j 0.10.9.5 pypi_0 pypi
pyarrow 17.0.0 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pygments 2.18.0 pypi_0 pypi
pyjwt 2.8.0 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
python 3.10.12 he1021f5_0
python-dateutil 2.9.0.post0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pywin32 306 pypi_0 pypi
pyzmq 26.0.3 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
setuptools 69.5.1 py310haa95532_0
six 1.16.0 pypi_0 pypi
spark-lighter-lib 34.0.0.6 pypi_0 pypi
sqlite 3.45.3 h2bbff1b_0
stack-data 0.6.3 pypi_0 pypi
tk 8.6.14 h0416ee5_0
tornado 6.4.1 pypi_0 pypi
traitlets 5.14.3 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
urllib3 2.2.2 pypi_0 pypi
vc 14.2 h2eaa2aa_4
vs2015_runtime 14.29.30133 h43f2093_4
wcwidth 0.2.13 pypi_0 pypi
wheel 0.43.0 py310haa95532_0
xz 5.4.6 h8cc25b3_1
zlib 1.2.13 h8cc25b3_1
Even installing pyspark 3.4.1 in this environment, didn't resolve the problem for me. I've installed pyspark throug:
(fabric-synapse-runtime-1-2) C:\Windows\System32>conda install -c conda-forge pyspark=3.4.1
This is what I'm doing in my notebook:
And this is the log:
20:09:38,171 root INFO current_directory c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86\SynapseNotebook\3724d86c-32d8-48fd-a6af-1fe92d751fd0\Notebook 1
20:09:38,171 root INFO workspace_path c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86
20:09:38,171 root INFO log_path c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86\logs\3724d86c-32d8-48fd-a6af-1fe92d751fd0
20:09:38,171 root INFO Using synapse remote kernel ...
20:09:38,171 root INFO Should attach session in dev mode False
20:09:39,575 root INFO Starting session 140c336b-d93a-4873-94f3-0856c44c5b25...
20:09:39,700 root INFO Getting refresh token...
20:09:39,938 root INFO https://pbipweu10-westeurope.pbidedicated.windows.net/webapi/capacities/04BE6C77-D18C-404E-B87F-1794800275F7/workloads/Notebook/Data/Direct/sparklighter/api/tjs/versions/2022-04-30/artifacts/3724d86c-32d8-48fd-a6af-1fe92d751fd0/sessions/140c336b-d93a-4873-94f3-0856c44c5b25
20:09:40,292 root INFO <session_management.SessionStatus object at 0x000001452719A4A0>
20:09:40,769 root INFO Trident session states: not_started
20:09:43,229 root INFO Trident session states: starting
20:09:45,738 root INFO Trident session states: idle
20:09:45,738 root INFO Starting session 140c336b-d93a-4873-94f3-0856c44c5b25 finished...
20:09:45,738 root INFO Attaching spark session 140c336b-d93a-4873-94f3-0856c44c5b25 for Spark Lighter ...
20:09:46,300 root INFO log4j.properties variable LOG_FILE_PATH c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86\logs\3724d86c-32d8-48fd-a6af-1fe92d751fd0\SparkLighter.log
20:09:55,752 root ERROR Failed to initialize Spark Lighter variables. An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:62066
at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:36)
at org.apache.spark.executor.Executor.<init>(Executor.scala:250)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:235)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:590)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
20:09:55,754 root INFO Registering Spark Lighter magics for IPython...
20:09:55,755 root INFO Registered Spark Lighter magics for IPython.
I've followed the steps provided in the instructions on the microsoft website:
VS Code extension overview - Microsoft Fabric | Microsoft Learn
So I'm not quite sure what to do to get it working in my VSCode application.
Any suggestions on what I can do to get my notebooks running locally?
Kind regards
Martijn
As stated in the other comment, the config:
conf.set("spark.driver.host", "localhost")
fixed this issue for me.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
8 | |
6 | |
5 | |
2 | |
1 |
User | Count |
---|---|
15 | |
10 | |
5 | |
4 | |
4 |