Solved: Re: SparkContext not initializing when using noteb...

Broeks · ‎07-26-2024

Hello,

I've been working on getting the spark job definition running locally. And although I've come along way, I'm not quite there yet.

I've set-up VSCode based on the following youtube video and documentation:
https://www.youtube.com/watch?v=A9SjAyZ_JSc
https://learn.microsoft.com/en-us/fabric/data-engineering/setup-vs-code-extension
https://learn.microsoft.com/en-us/fabric/data-engineering/author-sjd-with-vs-code

And one error I've got and was able to fix is the error:

[ERROR] 2024-07-26 22:06:57.430 [Thread-3] SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:56020
at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:36) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.executor.Executor.<init>(Executor.scala:250) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:235) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.SparkContext.<init>(SparkContext.scala:649) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_422]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_422]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_422]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_422]
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) ~[py4j-0.10.9.7.jar:?]
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) ~[py4j-0.10.9.7.jar:?]
at py4j.Gateway.invoke(Gateway.java:238) ~[py4j-0.10.9.7.jar:?]
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) ~[py4j-0.10.9.7.jar:?]
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) ~[py4j-0.10.9.7.jar:?]
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) ~[py4j-0.10.9.7.jar:?]
at py4j.ClientServerConnection.run(ClientServerConnection.java:106) ~[py4j-0.10.9.7.jar:?]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_422]

Can any one help me get this error fixed?

Kind regards,

Martijn

Anonymous · ‎08-06-2024

Hi @Broeks,

AFAIK, you still need to import the libraries even if they already exist on your local machine. In my opinion, I'd like to suggest you import used libraries on the notebook before execute the codes.
When you use the online version notebook with default Lakehouse, they will auto initialize some variables/configuration and libraries so that you can direct use them without import. (they did not works when you works on the local environment)

In addition, I'd like to suggest you check the document limitation part if they meet to your scenario:

VS Code extension overview - Microsoft Fabric | Microsoft Learn

The extension under the desktop mode doesn't support the Microsoft Spark Utilities yet

Regards,

Xiaoxin Sheng

View solution in original post

Broeks · ‎08-01-2024

I've started again from scratch and a thing that i noticed is that I need to install a spark version manually. Seems obvious, but the manual doesn't mention it, so this might be a good addition to the manual (VS Code extension overview - Microsoft Fabric | Microsoft Learn). So i've installed version 3.4.1 of spark to match with Runtime 1.2 (Apache Spark runtime in Fabric - Microsoft Fabric | Microsoft Learn)

Still the same error though.
I'm running on a trail with a capacity i don't hav access to:

Can this be releated to the errors i'm having. I'm able to run notebooks in the browser, but not in VSCode

Anonymous · ‎08-06-2024

Hi @Broeks,

AFAIK, you still need to import the libraries even if they already exist on your local machine. In my opinion, I'd like to suggest you import used libraries on the notebook before execute the codes.
When you use the online version notebook with default Lakehouse, they will auto initialize some variables/configuration and libraries so that you can direct use them without import. (they did not works when you works on the local environment)

In addition, I'd like to suggest you check the document limitation part if they meet to your scenario:

VS Code extension overview - Microsoft Fabric | Microsoft Learn

The extension under the desktop mode doesn't support the Microsoft Spark Utilities yet

Regards,

Xiaoxin Sheng

Broeks · ‎08-07-2024

Thanks for your suggestion. I've got it running!

The things i've done to make it work:
- Run VSCode as admin: this enable VScode to install te required packages while building the environment
- Create a spark context with the config provided for the spark job definitions.
- Added the code:

conf.set("spark.driver.host", "localhost")

This resulted in the following code to create te spark context:

from pyspark.sql import SparkSession
from pyspark.conf import SparkConf

#Spark session builder
conf = SparkConf()
conf.set("spark.lighter.client.plugin", "org.apache.spark.lighter.DefaultLighterClientPlugin")
conf.set("spark.sql.catalogImplementation", "lighter")
conf.set("spark.lighter.sessionState.implementation", "org.apache.spark.sql.lighter.client.SparkLighterSessionStateBuilder")
conf.set("spark.lighter.externalCatalog.implementation", "org.apache.spark.sql.lighter.client.ConnectCatalogClient")
conf.set("spark.driver.host", "localhost")
spark =  SparkSession.builder.config(conf=conf).getOrCreate()

My suggestion would be to add a code snippet to the built-in folder, similair to the spark job defintions.

Thanks for the support!

Anonymous · ‎07-28-2024

Hi @Broeks,

SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:56020

The error message seems related to spark context initialization failed and the spark URL invalid.

Have you checked if the required library installed correctly, and the ports are available which defined to install libraries?

Regards,

Xiaoxin Sheng

Broeks · ‎07-29-2024

Thanks for your response. I'm running the notebook using the SynapseVSCode extension, which created two conda environments for me: fabric-synapse-runtime-1-1 & fabric-synapse-runtime-1-2.

I've checked the installed packages in this environment, which are:

(fabric-synapse-runtime-1-2) C:\Windows\System32>conda list
# packages in environment at C:\ProgramData\miniconda3\envs\fabric-synapse-runtime-1-2:
#
# Name                    Version                   Build  Channel
asttokens                 2.4.1                    pypi_0    pypi
bzip2                     1.0.8                h2bbff1b_6
ca-certificates           2024.7.2             haa95532_0
certifi                   2024.7.4                 pypi_0    pypi
cffi                      1.16.0                   pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
comm                      0.2.2                    pypi_0    pypi
contourpy                 1.2.1                    pypi_0    pypi
cryptography              43.0.0                   pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
debugpy                   1.8.2                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
exceptiongroup            1.2.2                    pypi_0    pypi
executing                 2.0.1                    pypi_0    pypi
fonttools                 4.53.1                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
ipykernel                 6.29.5                   pypi_0    pypi
ipython                   8.26.0                   pypi_0    pypi
jedi                      0.19.1                   pypi_0    pypi
jupyter-client            8.6.2                    pypi_0    pypi
jupyter-core              5.7.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
libffi                    3.4.4                hd77b12b_1
matplotlib                3.9.1                    pypi_0    pypi
matplotlib-inline         0.1.7                    pypi_0    pypi
msal                      1.30.0                   pypi_0    pypi
nest-asyncio              1.6.0                    pypi_0    pypi
numpy                     2.0.1                    pypi_0    pypi
openssl                   3.0.14               h827c3e9_0
packaging                 24.1                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parso                     0.8.4                    pypi_0    pypi
pillow                    10.4.0                   pypi_0    pypi
pip                       24.0            py310haa95532_0
platformdirs              4.2.2                    pypi_0    pypi
prompt-toolkit            3.0.47                   pypi_0    pypi
psutil                    6.0.0                    pypi_0    pypi
pure-eval                 0.2.3                    pypi_0    pypi
py4j                      0.10.9.5                 pypi_0    pypi
pyarrow                   17.0.0                   pypi_0    pypi
pycparser                 2.22                     pypi_0    pypi
pygments                  2.18.0                   pypi_0    pypi
pyjwt                     2.8.0                    pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
python                    3.10.12              he1021f5_0
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pywin32                   306                      pypi_0    pypi
pyzmq                     26.0.3                   pypi_0    pypi
requests                  2.32.3                   pypi_0    pypi
setuptools                69.5.1          py310haa95532_0
six                       1.16.0                   pypi_0    pypi
spark-lighter-lib         34.0.0.6                 pypi_0    pypi
sqlite                    3.45.3               h2bbff1b_0
stack-data                0.6.3                    pypi_0    pypi
tk                        8.6.14               h0416ee5_0
tornado                   6.4.1                    pypi_0    pypi
traitlets                 5.14.3                   pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
urllib3                   2.2.2                    pypi_0    pypi
vc                        14.2                 h2eaa2aa_4
vs2015_runtime            14.29.30133          h43f2093_4
wcwidth                   0.2.13                   pypi_0    pypi
wheel                     0.43.0          py310haa95532_0
xz                        5.4.6                h8cc25b3_1
zlib                      1.2.13               h8cc25b3_1

Even installing pyspark 3.4.1 in this environment, didn't resolve the problem for me. I've installed pyspark throug:

(fabric-synapse-runtime-1-2) C:\Windows\System32>conda install -c conda-forge pyspark=3.4.1

This is what I'm doing in my notebook:

And this is the log:

20:09:38,171 root INFO current_directory c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86\SynapseNotebook\3724d86c-32d8-48fd-a6af-1fe92d751fd0\Notebook 1
20:09:38,171 root INFO workspace_path c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86
20:09:38,171 root INFO log_path c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86\logs\3724d86c-32d8-48fd-a6af-1fe92d751fd0
20:09:38,171 root INFO Using synapse remote kernel ...
20:09:38,171 root INFO Should attach session in dev mode False
20:09:39,575 root INFO Starting session 140c336b-d93a-4873-94f3-0856c44c5b25...
20:09:39,700 root INFO Getting refresh token...
20:09:39,938 root INFO https://pbipweu10-westeurope.pbidedicated.windows.net/webapi/capacities/04BE6C77-D18C-404E-B87F-1794800275F7/workloads/Notebook/Data/Direct/sparklighter/api/tjs/versions/2022-04-30/artifacts/3724d86c-32d8-48fd-a6af-1fe92d751fd0/sessions/140c336b-d93a-4873-94f3-0856c44c5b25
20:09:40,292 root INFO <session_management.SessionStatus object at 0x000001452719A4A0>
20:09:40,769 root INFO Trident session states: not_started
20:09:43,229 root INFO Trident session states: starting
20:09:45,738 root INFO Trident session states: idle
20:09:45,738 root INFO Starting session 140c336b-d93a-4873-94f3-0856c44c5b25 finished...
20:09:45,738 root INFO Attaching spark session 140c336b-d93a-4873-94f3-0856c44c5b25 for Spark Lighter ...
20:09:46,300 root INFO log4j.properties variable LOG_FILE_PATH c:\dev\fabric_vscode\12c424cc-eaf9-4bf7-b6cc-2cc8d5bd0a86\logs\3724d86c-32d8-48fd-a6af-1fe92d751fd0\SparkLighter.log
20:09:55,752 root ERROR Failed to initialize Spark Lighter variables. An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@HKCHG14_IL.home:62066

	at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)

	at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:140)

	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)

	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)

	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:36)

	at org.apache.spark.executor.Executor.<init>(Executor.scala:250)

	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)

	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)

	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:235)

	at org.apache.spark.SparkContext.<init>(SparkContext.scala:590)

	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)

	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)

	at py4j.Gateway.invoke(Gateway.java:238)

	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)

	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)

	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)

	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)

	at java.lang.Thread.run(Thread.java:750)


20:09:55,754 root INFO Registering Spark Lighter magics for IPython...
20:09:55,755 root INFO Registered Spark Lighter magics for IPython.

I've followed the steps provided in the instructions on the microsoft website:

VS Code extension overview - Microsoft Fabric | Microsoft Learn

So I'm not quite sure what to do to get it working in my VSCode application.

Any suggestions on what I can do to get my notebooks running locally?

Kind regards

Martijn

Broeks · ‎08-07-2024

As stated in the other comment, the config:

conf.set("spark.driver.host", "localhost")

fixed this issue for me.

SparkContext not initializing when using notebooks in VSCode

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

SparkContext not initializing when using notebooks in VSCode

Helpful resources

Join our Fabric User Panel

Fabric Monthly Update - June 2025

Fabric Community Update - June 2025