Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a notebook that is throwing the following error;
Py4JJavaError: An error occurred while calling o4515.execute.
: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.
at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotBroadcastTableOverMaxTableBytesError(QueryExecutionErrors.scala:2366)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:231)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
I have the following spark config set in the notebook, whcih I beleive is supposed to override the 8gb limit, but it doesn't appear to be working.
spark.conf.set('spark.sql.autoBroadcastJoinThreshold', '-1')
Does anyone have any ideas for getting around this error?
Thanks.
Hi @ce87,
Have you tried the -1 value without single quotes? Test if that makes a difference and disables it?
spark.conf.set('spark.sql.autoBroadcastJoinThreshold', -1)
No quotes didn't help. Was able to confirm that the spark config settings are taking affect using spark.conf.get(). Seems that it isn't making any difference.
I have opened a ticket with Microsoft Support.
Hi @ce87
Thanks for using Microsoft Fabric Community.
If you have opened a support ticket, a reference to the ticket number would be greatly appreciated. This will allow us to track the progress of your request and ensure you receive the most efficient support possible.
Thank you.
Hi @ce87
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.
Thank you.
Well, Microsoft support has been completly useless. All i've gotten is a bunch of emails from an operations manager asking if everything is OK. I probably wouldn't be opening a support ticket if everything was OK. Microsoft support is becoming a joke.
Hi @ce87,
Thanks for double checking that. Let us know what your progress is with Microsoft Support 👍
Thanks for the suggestions. No dice. I tried
spark.conf.set('spark.ms.autotune.enabled', 'true')
and confirmed it was properly set with;
spark.conf.get('spark.ms.autotune.enabled')
and I still run into the same issue.
Hi @ce87,
Maybe you can use Autotune for your Spark configuration. That feature also manages the spark.sql.autoBroadcastJoinThreshold setting:
https://learn.microsoft.com/en-us/fabric/data-engineering/autotune?tabs=pyspark
I believe you can enable Autotune via the below PySpark code
%%pyspark
spark.conf.set('spark.ms.autotune.enabled', 'true')