Re: Notebook Error: Cannot broadcast the table tha...

ce87 · ‎04-12-2024

I have a notebook that is throwing the following error;

Py4JJavaError: An error occurred while calling o4515.execute.
: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotBroadcastTableOverMaxTableBytesError(QueryExecutionErrors.scala:2366)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:231)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

I have the following spark config set in the notebook, whcih I beleive is supposed to override the 8gb limit, but it doesn't appear to be working.

spark.conf.set('spark.sql.autoBroadcastJoinThreshold', '-1')

Does anyone have any ideas for getting around this error?

Thanks.

Expiscornovus · ‎04-13-2024

Hi @ce87,

Have you tried the -1 value without single quotes? Test if that makes a difference and disables it?

spark.conf.set('spark.sql.autoBroadcastJoinThreshold', -1)

ce87 · ‎04-14-2024

No quotes didn't help. Was able to confirm that the spark config settings are taking affect using spark.conf.get(). Seems that it isn't making any difference.

I have opened a ticket with Microsoft Support.

v-cboorla-msft · ‎04-16-2024

Hi @ce87

Thanks for using Microsoft Fabric Community.

If you have opened a support ticket, a reference to the ticket number would be greatly appreciated. This will allow us to track the progress of your request and ensure you receive the most efficient support possible.

Thank you.

v-cboorla-msft · ‎04-19-2024

Hi @ce87

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.

Thank you.

ce87 · ‎04-22-2024

Well, Microsoft support has been completly useless. All i've gotten is a bunch of emails from an operations manager asking if everything is OK. I probably wouldn't be opening a support ticket if everything was OK. Microsoft support is becoming a joke.

Expiscornovus · ‎04-14-2024

Hi @ce87,

Thanks for double checking that. Let us know what your progress is with Microsoft Support 👍

ce87 · ‎04-12-2024

Thanks for the suggestions. No dice. I tried

spark.conf.set('spark.ms.autotune.enabled', 'true')

and confirmed it was properly set with;

spark.conf.get('spark.ms.autotune.enabled')

and I still run into the same issue.

Expiscornovus · ‎04-12-2024

Hi @ce87,

Maybe you can use Autotune for your Spark configuration. That feature also manages the spark.sql.autoBroadcastJoinThreshold setting:

https://learn.microsoft.com/en-us/fabric/data-engineering/autotune?tabs=pyspark

I believe you can enable Autotune via the below PySpark code

%%pyspark
spark.conf.set('spark.ms.autotune.enabled', 'true')

Notebook Error: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.

Helpful resources

Microsoft Fabric Learn Together

Fabric Community Update - April 2024