Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I created a report in Power BI Desktop which loads data from a view in Azure Data Lake Storage Gen2. The view is connected to a Databricks Delta table with 500M rows. I'm using the Spark connector and when I loaded the data locally, I had limited the view to only return approx. 38M rows, but when I published the report to the Power BI service, I had removed the limit in the view. The dataset automatically started refreshing after I published and after approx. 30 minutes it ended with the error below. Why does this error occur?
Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1278.0 failed 4 times, most recent failure: Lost task 15.3 in stage 1278.0 (TID 25587, 10.139.64.31, executor 21): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:287)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply$mcV$sp(TaskResult.scala:50)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.scheduler.DirectTaskResult$$anonfun$writeExternal$1.apply(TaskResult.scala:48)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1507)
at org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:48)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:646)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: 182ca04a-2da8-4078-9a94-6e399488d316
Request ID: 0ac4bc69-9e78-4941-948c-ff763d878722
Time: 2020-03-19 09:25:29Z
Were you able to figure out the issue? Can you suggest the solution , i am also stuck in the same situation
This was almost a year ago, so I don't remember exactly when we did what or what made the issue finally disappear, but we don't experience this issue anymore. Initially, we scaled up the cluster every time we needed to do a full load on a large report, as opposed to just incremental refresh, but now we don't need to scale up the cluster anymore.
What we've done to improve the situation:
- We're using Power BI Premium
- We've upgraded to a newer version of the cluster's Databricks runtime (7.3 LTS)
- We're using slightly bigger nodes for the cluster (Standard_L4s instead of Standard_DS3_v2)
Hi, @mjohannesson
When you have the connector problem, you may take a look at the following video. Hope it helps.
https://www.youtube.com/watch?v=JJfjI3stnsA
Best Regards
Allan
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks. I removed the BatchSize argument as the video suggested, but got the same error as before (see below) after approx. 40 minutes of refreshing the dataset.
Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 1860.0 failed 4 times, most recent failure: Lost task 10.3 in stage 1860.0 (TID 34338, 10.139.64.31, executor 84): ExecutorLostFailure (executor 84 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: b862ad9c-69ab-4ea8-84ff-6086b1b543b8
Request ID: 2d3ab0b0-c70b-42d2-af78-2d71314ed264
Time: 2020-03-20 09:10:49Z
I tried again and got another error:
Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 1325.0 failed 4 times, most recent failure: Lost task 15.3 in stage 1325.0 (TID 26141, 10.139.64.31, executor 42): ExecutorLostFailure (executor 42 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:'.. The exception was raised by the IDbCommand interface.
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: a63a8ce5-988d-4651-ae76-45ae685cd476
Request ID: 7bb522c5-a245-1f25-5ce0-d1dd56a210e6
Time: 2020-03-19 10:03:33Z
Check out the July 2025 Power BI update to learn about new features.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
29 | |
19 | |
19 | |
13 | |
12 |
User | Count |
---|---|
29 | |
20 | |
19 | |
18 | |
16 |