Re: Refresh from Spark data source fails

mjohannesson · ‎03-19-2020

I created a report in Power BI Desktop which loads data from a view in Azure Data Lake Storage Gen2. The view is connected to a Databricks Delta table with 500M rows. I'm using the Spark connector and when I loaded the data locally, I had limited the view to only return approx. 38M rows, but when I published the report to the Power BI service, I had removed the limit in the view. The dataset automatically started refreshing after I published and after approx. 30 minutes it ended with the error below. Why does this error occur?

Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1278.0 failed 4 times, most recent failure: Lost task 15.3 in stage 1278.0 (TID 25587, 10.139.64.31, executor 21): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:287)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1507)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:646)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: 182ca04a-2da8-4078-9a94-6e399488d316
Request ID: 0ac4bc69-9e78-4941-948c-ff763d878722
Time: 2020-03-19 09:25:29Z

ChayDommeti · ‎02-24-2021

Were you able to figure out the issue? Can you suggest the solution , i am also stuck in the same situation

mjohannesson · ‎02-25-2021

This was almost a year ago, so I don't remember exactly when we did what or what made the issue finally disappear, but we don't experience this issue anymore. Initially, we scaled up the cluster every time we needed to do a full load on a large report, as opposed to just incremental refresh, but now we don't need to scale up the cluster anymore.

What we've done to improve the situation:

- We're using Power BI Premium

- We've upgraded to a newer version of the cluster's Databricks runtime (7.3 LTS)

- We're using slightly bigger nodes for the cluster (Standard_L4s instead of Standard_DS3_v2)

v-alq-msft · ‎03-19-2020

Hi, @mjohannesson

When you have the connector problem, you may take a look at the following video. Hope it helps.

https://www.youtube.com/watch?v=JJfjI3stnsA

Best Regards

Allan

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

mjohannesson · ‎03-20-2020

Thanks. I removed the BatchSize argument as the video suggested, but got the same error as before (see below) after approx. 40 minutes of refreshing the dataset.

Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 1860.0 failed 4 times, most recent failure: Lost task 10.3 in stage 1860.0 (TID 34338, 10.139.64.31, executor 84): ExecutorLostFailure (executor 84 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: b862ad9c-69ab-4ea8-84ff-6086b1b543b8
Request ID: 2d3ab0b0-c70b-42d2-af78-2d71314ed264
Time: 2020-03-20 09:10:49Z

mjohannesson · ‎03-19-2020

I tried again and got another error:

Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1325.0 failed 4 times, most recent failure: Lost task 15.3 in stage 1325.0 (TID 26141, 10.139.64.31, executor 42): ExecutorLostFailure (executor 42 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: a63a8ce5-988d-4651-ae76-45ae685cd476
Request ID: 7bb522c5-a245-1f25-5ce0-d1dd56a210e6
Time: 2020-03-19 10:03:33Z

Refresh from Spark data source fails

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025

Party with Power BI’s own Guy in a Cube

Refresh from Spark data source fails

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025