Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
mjohannesson
Advocate I
Advocate I

Refresh from Spark data source fails

I created a report in Power BI Desktop which loads data from a view in Azure Data Lake Storage Gen2. The view is connected to a Databricks Delta table with 500M rows. I'm using the Spark connector and when I loaded the data locally, I had limited the view to only return approx. 38M rows, but when I published the report to the Power BI service, I had removed the limit in the view. The dataset automatically started refreshing after I published and after approx. 30 minutes it ended with the error below. Why does this error occur?

 

Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1278.0 failed 4 times, most recent failure: Lost task 15.3 in stage 1278.0 (TID 25587, 10.139.64.31, executor 21): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:287)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1507)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:646)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: 182ca04a-2da8-4078-9a94-6e399488d316
Request ID: 0ac4bc69-9e78-4941-948c-ff763d878722
Time: 2020-03-19 09:25:29Z

5 REPLIES 5
ChayDommeti
Frequent Visitor

Were you able to figure out the issue? Can you suggest the solution , i am also stuck in the same situation 

This was almost a year ago, so I don't remember exactly when we did what or what made the issue finally disappear, but we don't experience this issue anymore. Initially, we scaled up the cluster every time we needed to do a full load on a large report, as opposed to just incremental refresh, but now we don't need to scale up the cluster anymore.

 

What we've done to improve the situation:

- We're using Power BI Premium

- We've upgraded to a newer version of the cluster's Databricks runtime (7.3 LTS)

- We're using slightly bigger nodes for the cluster (Standard_L4s instead of Standard_DS3_v2)

v-alq-msft
Community Support
Community Support

Hi, @mjohannesson 

 

When you have the connector problem, you may take a look at the following video. Hope it helps.

https://www.youtube.com/watch?v=JJfjI3stnsA 

 

Best Regards

Allan

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks. I removed the BatchSize argument as the video suggested, but got the same error as before (see below) after approx. 40 minutes of refreshing the dataset.

 

Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 1860.0 failed 4 times, most recent failure: Lost task 10.3 in stage 1860.0 (TID 34338, 10.139.64.31, executor 84): ExecutorLostFailure (executor 84 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: b862ad9c-69ab-4ea8-84ff-6086b1b543b8
Request ID: 2d3ab0b0-c70b-42d2-af78-2d71314ed264
Time: 2020-03-20 09:10:49Z

mjohannesson
Advocate I
Advocate I

I tried again and got another error:

 

Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1325.0 failed 4 times, most recent failure: Lost task 15.3 in stage 1325.0 (TID 26141, 10.139.64.31, executor 42): ExecutorLostFailure (executor 42 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: a63a8ce5-988d-4651-ae76-45ae685cd476
Request ID: 7bb522c5-a245-1f25-5ce0-d1dd56a210e6
Time: 2020-03-19 10:03:33Z

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.