Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ce87
Helper I
Helper I

Notebook Error: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.

I have a notebook that is throwing the following error;

 

 

Py4JJavaError: An error occurred while calling o4515.execute.
: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotBroadcastTableOverMaxTableBytesError(QueryExecutionErrors.scala:2366)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:231)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

 

 

I have the following spark config set in the notebook, whcih I beleive is supposed to override the 8gb limit, but it doesn't appear to be working.

 

 

spark.conf.set('spark.sql.autoBroadcastJoinThreshold', '-1')

 

 

Does anyone have any ideas for getting around this error?

 

Thanks.

8 REPLIES 8
Expiscornovus
Resolver III
Resolver III

Hi @ce87,

 

Have you tried the -1 value without single quotes? Test if that makes a difference and disables it?

 

spark.conf.set('spark.sql.autoBroadcastJoinThreshold', -1)

 

 

No quotes didn't help. Was able to confirm that the spark config settings are taking affect using spark.conf.get(). Seems that it isn't making any difference.

 

I have opened a ticket with Microsoft Support. 

Hi @ce87 

 

Thanks for using Microsoft Fabric Community.

If you have opened a support ticket, a reference to the ticket number would be greatly appreciated. This will allow us to track the progress of your request and ensure you receive the most efficient support possible.

 

Thank you.

Hi @ce87 


We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.


Thank you.

Well, Microsoft support has been completly useless. All i've gotten is a bunch of emails from an operations manager asking if everything is OK. I probably wouldn't be opening a support ticket if everything was OK. Microsoft support is becoming a joke.

Hi @ce87,

 

Thanks for double checking that. Let us know what your progress is with Microsoft Support 👍

ce87
Helper I
Helper I

Thanks for the suggestions. No dice. I tried 

spark.conf.set('spark.ms.autotune.enabled', 'true')

and confirmed it was properly set with;

spark.conf.get('spark.ms.autotune.enabled')   

 

and I still run into the same issue.

Expiscornovus
Resolver III
Resolver III

Hi @ce87,

 

Maybe you can use Autotune for your Spark configuration. That feature also manages the spark.sql.autoBroadcastJoinThreshold setting:

https://learn.microsoft.com/en-us/fabric/data-engineering/autotune?tabs=pyspark

 

I believe you can enable Autotune via the below PySpark code

%%pyspark
spark.conf.set('spark.ms.autotune.enabled', 'true')

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors