Re: Spooling failed with error: The operation fai...

dbeavon3 · ‎08-20-2024

Hi all,
I'm looking for help with spooling failures. This type of problem seems to be increasing in frequency. I can start by describing our environment. We have a dedicated Power BI capacity that is basically idle. We also have very beefy gateway servers. Given the amount of hardware we are committing to this, we shouldn't have these severe spooling errors.

In order to find this error, we look for the files named like so:

Report\QueryExecutionReport_200123-DESKTOP_20240523T220000.log

... they can be found in a directory like so:

systemprofile\AppData\Local\Microsoft\On-premises data gateway\

The error looks pretty confusing like you see below. I'm guessing that the most relevant details are hidden from us. They are probably logged on the Microsoft side, and are not found anywhere in our gateway server:

"[""{\""kind\"":\""Web\"",\""path\"":\
""https://power-reporting.ufpi.com/power-reporting-api\""}""]",

Spooling failed with error: The operation failed due to an explicit cancellation. Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.PowerBI.DataMovement.Pipeline.Dataflow.TDFHelpers.<>c__DisplayClass7_0`1.<<GetNextResponseAsync>b__0>d.MoveNext()--- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.PowerBI.DataMovement.Pipeline.Dataflow.TDFHelpers.<>c__DisplayClass12_0.<<ExecuteBlockOperationAsync>b__0>d.MoveNext()

I've already contacted support, and asked for the logs on the server/kusto side to explain these frequent failures. But as-of now, nothing has been shared yet.

How does this impact us? Oddly enough we do NOT normally observe this error from Power BI in a first-hand way. Instead, we find that there is some type of an "implicit retry" happening as a result of these spooling failures. The failures are observed in a second-hand way. These spooling failures result in retries that are a bigger problem than if Power BI would simply fail on the first attempt. Instead of telling us there was a failure, Power BI appears to suppress these failure messages, and it proceeds to keep initiating one or more retries. The retries will trigger the same mashup container to be launched. The duplicate mashups can run concurrently and they swamp our back-end environment, since they each can take up to an hour to complete.

Furthermore, the cycle of failure and retry is causing our scheduled refresh operations to take longer to complete (users are not getting data when they would expect. )

As mentioned, the mashups themselves are succeeding. But it is the "spooling" opreation that is failing. It is very frustrating because if we look in the normal mashup logs, we never find any problems that we can fix on our own.

There doesn't appear to be any place to configure the retry mechanism that is responsible for these duplicate mashups. These mashups should NOT be repeatedly launched on the same gateway without a custom configuration to that effect.

If anyone recognizes this error message, please let me know how to investigate. I'd like to uncover the reason for the spooling failures, and/or I'd like to be able to configure retries to stop after the first attempt (thereby avoiding the creation of another type of problem on our back-end servers)

dbeavon3 · ‎08-25-2024

Hi @SaiTejaTalasila

Thanks for the reply.

This problem happens frequently, but not 100% of the time.

Are you familiar with the error?
Spooling failed with error: The operation failed due to an explicit cancellation

... where do we find supporting information to explain the reason for this?

Our gateway and datasources run locally (on prem). The infrastructure is about as simple as you can imagine.

I really don't want to spend lots of time tinkering around by trial-and-error, or by using fiddler. I just want to know where I can get more information about a "spooling" failure in the gateway. There has to be another log with more details. I'm guessing it is available on the Microsoft/Mindtree side. Do you have tricks for getting them to share the logs on their side?

As the last possible case, I might hook up a debugger to the gateway, and monitor for the so-called TaskCanceledException

I'm guessing that if the debugger could trap this first-chance exception, I might be able to find out what circumstances are causing it to happen. It is really unfortunate that customers are fending for ourselves and that Microsoft won't give better better logs to diagnose these problems.

lbendlin · ‎08-25-2024

Lastly - keep raising pro tickets, lots of them. That's the only way to eventually get some traction.

lbendlin · ‎08-20-2024

Welcome to the club.

The error may be caused by the data source cutting off the request when it becomes impatient - so check for any server side timeout settings.

Next you will want to optimize your gateway configuration, particularly you waht to allow spooling before the request completes, and you want to up the number of permitted mashup containers and the memory and compute they are allowed to consume. Do not run anything else on the gateway cluster VMs, only the gateweay service

Lastly - keep raising pro tickets, lots of them. That's the only way to eventually get some traction.

dbeavon3 · ‎08-23-2024

Hi @lbendlin

Even if we don't know the reason for the spooling error itself, perhaps we can focus on the subsequent retries. Have you ever worked with Microsoft to explain the implicit retries?

Those retries can do more harm than good. We do NOT want repeated attempts to retrieve data from our data sources. This would apply to 95% of ouf PBI dataflows and datasets.

... If ever the repeated attempts become necessary, we'd like to be able to configure those ourselves in a manual way.

lbendlin · ‎08-23-2024

You'll need to distinguish between Power BI and Fabric refreshes.

Power BI:

Power BI automatic refresh retries (crossjoin.co.uk)

you can see in the refresh history if any retries were attempted.

Fabric:

Fabric Data Factory Spotlight: Semantic model refresh activity – The Data Architect's Desk (thedata...

dbeavon3 · ‎08-20-2024

Hi @lbendlin

Do you have any tickets yet? I am hoping there is a configuration somewhere to prevent the implicit retries when the spooling errors happen. That would at least avoid the unintended impact on the back-end systems. I don't want our back-end systems to suffer, because of some bugs in the Power BI dataflow refresh operations.

I'm guessing that the Microsoft product team will want to blame this on random network glitches (ie. solar flare from outer space). But I don't really buy it. The TCP protocol is extremely fault-tolerant, and is able to recover from lots of networking problems. The failures in this service seem like they are more likely to be related to software bugs and/or design flaws.

It happens a lot more frequently than you would expect. In fact, the Mindtree engineer asked me to turn on enhanced logging, and we captured one of these happening on the first day, during the first three refresh operations.

I suspect the Microsoft team already has a way to monitor my gateway for the health of the network, and can rule out any obvious problems with network health. Aside from Microsoft, I suppose the only other component where I might put the blame is the Zscaler team. Perhaps there is an outbound NAT issue which only affects Power BI.

The main problem is that there are virtually no meaningful errors in my logs, and Microsoft hasn't been willing to share anything from their side yet.

SaiTejaTalasila · ‎08-25-2024

Hi @dbeavon3 ,

Create the same dataflow in any other workspace(export json file) and see whether the same issue occurs or not.If it is still not working -restart your gateway and check your gateway version and do some analysis with tools like fiddler.

Are your data sources, gateway and fabric capacity are running in same country or region?

Thanks,

Sai Teja

Spooling failed with error: The operation failed due to an explicit cancellation.

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Spooling failed with error: The operation failed due to an explicit cancellation.

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025