Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
um4ndr
Regular Visitor

Unable to start spark session

Hi,
 
Fabric setup:
um4ndr_0-1715008920067.png
 
Adding details of the problem:
I have three workspaces Dev, QA and Prod. Since yesterday afternoon I have been having problems launching the PySpark session when I'm trying to run notebooks on Fabric. This issue only exists in the Dev workspace. The rest of the workspaces are working fine.
 
This is what I have as an error information:
 
{
"timestamp": "2024-05-06T15:23:20.837Z",
"transientCorrelation": "ba82de8b-049c-4947-b026-a1885b4a9a60",
"aznb": {
"version": "1.6.63"
},
"notebook": {
"notebookName": "etl_extract_bo_datasource_to_bronze",
"instanceId": "34d2dab6-e73b-45d7-8d5a-b82c49257053",
"documentId": "trident-w-6601e969-d5e2-4bf3-8c41-2e21eedd8d17-a-b0e79991-d8c4-4058-95ec-53a9ff0cb3b4",
"workspaceId": "6601e969-d5e2-4bf3-8c41-2e21eedd8d17",
"kernelId": "844bd533-3f1c-4b31-92ba-66ef127e6950",
"clientSessionId": "dcd38904-b4b9-4ec3-afae-381424ecb4ec",
"kernelState": "not connected",
"computeUrl": "https://3fcc80ec4f884247b36979635b077693.pbidedicated.windows.net/webapi/capacities/3FCC80EC-4F88-42...",
"computeState": "connected",
"collaborationStatus": "online / joined",
"isSaveLeader": true
},
"synapseController": {
"id": "34d2dab6-e73b-45d7-8d5a-b82c49257053:snc1",
"enabled": true,
"activeKernelHandler": "sparkLivy",
"language": "python",
"state": "error",
"sessionId": "6784eee3-85d1-46d7-a581-db9529a27bfb",
"applicationId": null,
"applicationName": "",
"sessionErrors": [
"Livy session has failed. Error code: SparkCoreError/Other. SessionInfo.State from SparkCore is Error: Error while trying to establish a connection through the managed network. ErrorCode UnknownError. Please retry. Source: User."
]
}
}
 
I have no any additional information
um4ndr_1-1715009168367.png

 

Can any expert help ASAP?

 
1 ACCEPTED SOLUTION
um4ndr
Regular Visitor

Magic!!! For some reason the problem no longer appears 

um4ndr_0-1715146943291.png No comments ...

View solution in original post

14 REPLIES 14
um4ndr
Regular Visitor

 It's happened again 🙁

This is what I received as a response from support:

Can you please clear your browser’s cache and try again? To clear your browser cache and cookies in Microsoft Edge, go to Settings > Privacy > Clear browsing data or click Ctrl+Shift+Del.

 

um4ndr_2-1715269012015.png

 

The problem has been resolved for now.

 

 



 

Dronec
Helper II
Helper II

The response I got from the MS' rep:

 


Below is the RCA of the issue which you were facing.

Issue: There was a change deployed to delete inbound PE associated with workspace VNet on moving across capacities. During this process, if there are no end points in the VNet, the VNet is deleted. However, after deleting the VNet in Synapse, the Vnet state is not updated in the Power BI store. The VNet status is now out of sync with Synapse. This blocked migration of folders to a new capacity in all those cases.
 
Workaround: So, for fixing this problem, we have reverted the change. To unblock the users.

I also received an answer:

 

Checking internally for any issues I can confirm our Product Group (PG) have identified an issue within the latest deployment. The deployment has been stopped and reverted for now. Until the issue is fixed completely, PG team will check daily and mitigate the impacted tenants manually if the issue will still occur.

 

Meanwhile, let's monitor how this goes for few more days before deciding the next steps.

Okay, looks like we were right and that was just a f* up from MS. Nothing to see here, moving along.

um4ndr
Regular Visitor

Magic!!! For some reason the problem no longer appears 

um4ndr_0-1715146943291.png No comments ...

um4ndr
Regular Visitor

Restarting the Fabric capacity did not help, even made more problems across workspaces 😞

 

um4ndr_0-1715145261686.png


I solved this problem by resizing up/down of the capacity.
At this point I'm back to the original problem. 
Was opened MS support ticket.

Also having the same issue. Has affected all notebooks a given workspace seemingly at random. Today I migrated code to a new workspace that didn't have the issue to do edits and was able to work with it all day......up until it suddenly decided to generate the same error as of the EOD. I've yet to resolve it in any workspaces that 

 

This is kind of a showstopper for me -- the only saving grace is that for me it's not interrupting a business dependent process. But it's near completly stopped my development on a project that's dependent on Fabric notebooks (and migrating work to different workspaces is difficult due to deployment pipelines not supporting all objects). Submitted a ticket and apparently the product group is working on it, but this is a really, really, bad one in that it's stopped work and doesn't seem to have a functional work around other than make a new workspace, migrate, and pray.

 

Out of Preview, eh? Just let me have a dedicated compute instance like in AML pretty please. At least as a backup.

I guess it might be useful if you them the number of the similar ticket they already resolved (like mine 2405070030000651)

My issue magically disappeared overnight and all notebooks are working now, however my org can't tolerate 24 hours downtime. I've also opened a ticket.

I've found that very similar problem in Azure Synapse Notebooks can be fixed by re-creating the spark pool, and looks like that's what resizing does

um4ndr
Regular Visitor

Here's what I found while researching network tunnel issues during browser(Edge) inspection. 

um4ndr_0-1715059377491.png

 

Very strange ... All my workspaces relates with a same capacity located in West US.




Yep. 404 - resouce not found. Seems like a dead node/driver. Just checked that I have the same:

Dronec_0-1715060120314.png

 

Dronec
Helper II
Helper II

I've just tried to run the same notebook with the same parameters multiple times from the pipeline. Funny enough, it ran approximately 3 times out of 10:

Dronec_0-1715055205169.png

My guess is that some compute nodes that run Spark sessions went down and Microsoft tech support is sleeping behind the wheel

Dronec
Helper II
Helper II

Same for me. Notebooks stopped working with the same error

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors