Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
um4ndr
Advocate I
Advocate I

Unable to start spark session

Hi,
 
Fabric setup:
um4ndr_0-1715008920067.png
 
Adding details of the problem:
I have three workspaces Dev, QA and Prod. Since yesterday afternoon I have been having problems launching the PySpark session when I'm trying to run notebooks on Fabric. This issue only exists in the Dev workspace. The rest of the workspaces are working fine.
 
This is what I have as an error information:
 
{
"timestamp": "2024-05-06T15:23:20.837Z",
"transientCorrelation": "ba82de8b-049c-4947-b026-a1885b4a9a60",
"aznb": {
"version": "1.6.63"
},
"notebook": {
"notebookName": "etl_extract_bo_datasource_to_bronze",
"instanceId": "34d2dab6-e73b-45d7-8d5a-b82c49257053",
"documentId": "trident-w-6601e969-d5e2-4bf3-8c41-2e21eedd8d17-a-b0e79991-d8c4-4058-95ec-53a9ff0cb3b4",
"workspaceId": "6601e969-d5e2-4bf3-8c41-2e21eedd8d17",
"kernelId": "844bd533-3f1c-4b31-92ba-66ef127e6950",
"clientSessionId": "dcd38904-b4b9-4ec3-afae-381424ecb4ec",
"kernelState": "not connected",
"computeUrl": "https://3fcc80ec4f884247b36979635b077693.pbidedicated.windows.net/webapi/capacities/3FCC80EC-4F88-42...",
"computeState": "connected",
"collaborationStatus": "online / joined",
"isSaveLeader": true
},
"synapseController": {
"id": "34d2dab6-e73b-45d7-8d5a-b82c49257053:snc1",
"enabled": true,
"activeKernelHandler": "sparkLivy",
"language": "python",
"state": "error",
"sessionId": "6784eee3-85d1-46d7-a581-db9529a27bfb",
"applicationId": null,
"applicationName": "",
"sessionErrors": [
"Livy session has failed. Error code: SparkCoreError/Other. SessionInfo.State from SparkCore is Error: Error while trying to establish a connection through the managed network. ErrorCode UnknownError. Please retry. Source: User."
]
}
}
 
I have no any additional information
um4ndr_1-1715009168367.png

 

Can any expert help ASAP?

 
1 ACCEPTED SOLUTION
um4ndr
Advocate I
Advocate I

Magic!!! For some reason the problem no longer appears 

um4ndr_0-1715146943291.png No comments ...

View solution in original post

17 REPLIES 17
um4ndr
Advocate I
Advocate I

 It's happened again 🙁

This is what I received as a response from support:

Can you please clear your browser’s cache and try again? To clear your browser cache and cookies in Microsoft Edge, go to Settings > Privacy > Clear browsing data or click Ctrl+Shift+Del.

 

um4ndr_2-1715269012015.png

 

The problem has been resolved for now.

 

 



 

Are you using spark 1.2 or 1.3? i havent been able to make it work all day

Dronec
Advocate II
Advocate II

The response I got from the MS' rep:

 


Below is the RCA of the issue which you were facing.

Issue: There was a change deployed to delete inbound PE associated with workspace VNet on moving across capacities. During this process, if there are no end points in the VNet, the VNet is deleted. However, after deleting the VNet in Synapse, the Vnet state is not updated in the Power BI store. The VNet status is now out of sync with Synapse. This blocked migration of folders to a new capacity in all those cases.
 
Workaround: So, for fixing this problem, we have reverted the change. To unblock the users.

I also received an answer:

 

Checking internally for any issues I can confirm our Product Group (PG) have identified an issue within the latest deployment. The deployment has been stopped and reverted for now. Until the issue is fixed completely, PG team will check daily and mitigate the impacted tenants manually if the issue will still occur.

 

Meanwhile, let's monitor how this goes for few more days before deciding the next steps.

Okay, looks like we were right and that was just a f* up from MS. Nothing to see here, moving along.

um4ndr
Advocate I
Advocate I

Magic!!! For some reason the problem no longer appears 

um4ndr_0-1715146943291.png No comments ...

um4ndr
Advocate I
Advocate I

Restarting the Fabric capacity did not help, even made more problems across workspaces 😞

 

um4ndr_0-1715145261686.png


I solved this problem by resizing up/down of the capacity.
At this point I'm back to the original problem. 
Was opened MS support ticket.

Also having the same issue. Has affected all notebooks a given workspace seemingly at random. Today I migrated code to a new workspace that didn't have the issue to do edits and was able to work with it all day......up until it suddenly decided to generate the same error as of the EOD. I've yet to resolve it in any workspaces that 

 

This is kind of a showstopper for me -- the only saving grace is that for me it's not interrupting a business dependent process. But it's near completly stopped my development on a project that's dependent on Fabric notebooks (and migrating work to different workspaces is difficult due to deployment pipelines not supporting all objects). Submitted a ticket and apparently the product group is working on it, but this is a really, really, bad one in that it's stopped work and doesn't seem to have a functional work around other than make a new workspace, migrate, and pray.

 

Out of Preview, eh? Just let me have a dedicated compute instance like in AML pretty please. At least as a backup.

I guess it might be useful if you them the number of the similar ticket they already resolved (like mine 2405070030000651)

My issue magically disappeared overnight and all notebooks are working now, however my org can't tolerate 24 hours downtime. I've also opened a ticket.

I've found that very similar problem in Azure Synapse Notebooks can be fixed by re-creating the spark pool, and looks like that's what resizing does

um4ndr
Advocate I
Advocate I

Here's what I found while researching network tunnel issues during browser(Edge) inspection. 

um4ndr_0-1715059377491.png

 

Very strange ... All my workspaces relates with a same capacity located in West US.




Yep. 404 - resouce not found. Seems like a dead node/driver. Just checked that I have the same:

Dronec_0-1715060120314.png

 

Dronec
Advocate II
Advocate II

I've just tried to run the same notebook with the same parameters multiple times from the pipeline. Funny enough, it ran approximately 3 times out of 10:

Dronec_0-1715055205169.png

My guess is that some compute nodes that run Spark sessions went down and Microsoft tech support is sleeping behind the wheel

Dronec
Advocate II
Advocate II

Same for me. Notebooks stopped working with the same error

I'm experimanting the same issue now [SEP. 2024]. 

 

SparkCoreError/SessionDidNotEnterIdle: Livy session has failed. Error code: SparkCoreError/SessionDidNotEnterIdle. SessionInfo.State from SparkCore is Error: Session did not enter idle state after 10 minutes

I have the same problem since yesterday sept 4th 2024

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.