topic Re: Notebook Error: CANNOT_OPEN_SOCKET in collect() in Data Engineering

Notebook Error: CANNOT_OPEN_SOCKET in collect()

navakanth_DE — Thu, 09 Apr 2026 05:34:36 GMT

Hi Fabric community,

I’m running a piece of code in a Fabric notebook that performs a few .collect() and .first() operations on a very small dataset (less than 5 columns). Intermittently, the job fails with a socket error, but a subsequent run often succeeds.

Retry or avoiding collect operations will help but wanted to check the root cause for this error .

I found a Databricks Community post suggesting this can be caused by a Databricks runtime upgrade (Solved: [CANNOT_OPEN_SOCKET] Can not open socket — Databricks Community #134032)

Error message (representative):

PySparkRuntimeError: pyspark.errors.exceptions.base.PySparkRuntimeError: [CANNOT_OPEN_SOCKET] Can not open socket: ["tried to connect to ('123.4.5.67), but an error occurred: [Errno 104] Connection reset by peer"]

Has anyone seen this behavior in Fabric notebooks and can suggest likely causes or mitigations?

Re: CANNOT_OPEN_SOCKET] Can not open socket error

tayloramy — Wed, 08 Apr 2026 13:21:04 GMT

Hi @navakanth_DE,

I have not encountered this in my fabric environments before. Can you tell us what your data sources are?

Is this an issue when reading the data from source, when writing it to a fabric datastore as a target, or is this an issue in processing the data once it's already been loaded to a spark dataframe?

Re: CANNOT_OPEN_SOCKET] Can not open socket error

navakanth_DE — Wed, 08 Apr 2026 14:33:22 GMT

Hey @tayloramy ,

Getting this error while i am processing the data from a dataframe . For example , read the data from delta lake and trying to get a record using a collect/first function from the dataframe

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

deborshi_nag — Thu, 09 Apr 2026 19:12:05 GMT

Hello @navakanth_DE

The underlying reason for this error is that a Spark action such as .collect() or .first() forces Microsoft Fabric to send results back from the Spark driver and executors into the Python process, and that network connection is being reset while the action is completing. This is not related to dataset size or faulty code logic, and it can happen even when you are working with very small, simple datasets.

One contributing cause is Fabric’s use of managed, ephemeral compute with autosuspend and background rebalancing. Spark drivers and executors can be paused, recycled, or restarted due to capacity throttling or internal health checks, sometimes right in the middle of returning results to Python. When that happens, the driver‑to‑Python socket is dropped, which surfaces as a “connection reset by peer” error.

Another factor is running multiple small Spark actions in the same notebook. Each .collect() or .first() triggers a separate Spark job and opens a new result channel back to Python, increasing the number of round trips across that fragile boundary. Even though each action is cheap, the cumulative effect makes it more likely that one of those result transfers gets interrupted under capacity pressure.

You can reduce the likelihood of this issue by minimising round trips to Python and restructuring your code so fewer actions are executed overall. Push as much logic as possible into Spark transformations, and when you only need a small sample, prefer .take(n) or .limit(n) instead of repeated .collect() calls. If you do need data in Python, aim to do a single, controlled .collect() at the end rather than many small ones throughout the notebook.

The source of this information has come from Microsoft CoPilot.

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

tayloramy — Thu, 09 Apr 2026 13:42:33 GMT

Hi @navakanth_DE,

@deborshi_nag - I don't think accepting that "this just happens" and advising users to "do less transformations" is an acceptable answer to this problem.

@navakanth_DE, I've never encountered this myself, but this is very much an issue if this is happening. I'd recommend opening a support ticket with Microsoft, that way they can dig into the telemetry from your tenant and get to the bottom of exactly what is going on, and if there is a bug in the platform, they can get it on the product team's roadmap to fix.

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

deborshi_nag — Thu, 09 Apr 2026 15:02:20 GMT

Hello @tayloramy

To clarify, I’m not suggesting this behaviour is “acceptable” or expected from a user perspective, nor that the solution is simply to “do less work”.

The point I was making is about where the instability is introduced. In Fabric (and other managed Spark services), instability tends to surface specifically at action boundaries, where results are marshalled from the Spark driver back into the Python process over a socket.

Using more Spark transformations and fewer actions is not a general performance tip, but a way to reduce exposure to that driver‑to‑Python boundary. Each collect(), first(), or count() opens a new result channel; under capacity pressure or executor recycling, that channel can be reset even for very small datasets.

So the mitigation is not “do less transformations”, but batching result materialisation and being deliberate about when data is pulled into Python, until the underlying platform behaviour is improved.

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

navakanth_DE — Fri, 10 Apr 2026 05:28:37 GMT

hey @tayloramy , I have already raised a MSFT ticket as well long back 4 months ago but even they are not able to provide a proper solution for this .

Only recommendation sugessted by them is to apply a retry mechanism in the code when there is a cocket error .

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

ati_puri — Thu, 16 Apr 2026 05:57:32 GMT

Hi,

Generally it is not advised to use .collect() even while working with small subset of data as this functions holds the result back to the driver memory. The interaction between the driver and workers while transferring the result makes it more slower as compared to other functions like .take() etc. Its is advisable to use .take(), filter by using .limit() function , cache or persist the results and clear spark cache for releasing driver memoery.

Thanks

@navakanth_DE

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

v-aatheeque — Tue, 14 Apr 2026 10:10:19 GMT

Hi @navakanth_DE

Following up to confirm if the earlier responses addressed your query. If not, please share your questions and we’ll assist further.

Re: Notebook Error: CANNOT_OPEN_SOCKET in collect()

v-aatheeque — Fri, 17 Apr 2026 10:48:56 GMT

Hi @navakanth_DE

We wanted to follow up to check if you’ve had an opportunity to review the previous responses. If you require further assistance, please don’t hesitate to let us know.