Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Python Notebook checkpoints + a second connection to a given notebook from the same user should disconnect the first one.

Idea: Fabric notebooks should support checkpointing and autosave checkpoints. Also, there should be a maximum of one connection with cell editing permission to a notebook from a given user at a time.


Short version: Most implementations of Jupyter Notebooks autosave multiple (five?) checkpointed versions of your file as you work to help with recovery, as well as have a manual "save check point" menu option to save a dedicated quick reversion point for your file. This behaviour/feature is in Azure Machine Learning notebooks, but isn't in Fabric. It would be nice to have it for consistency of experience. The existing Fabric autosave is an unrecoverable hammer and has caused me more issues than it's solved. Sure, you can revert back to the last thing you pushed to a Git repo --- but unless you're comitting every time you write a line of code, there's a chance that you might accidentally wipe some cells that you wrote between commits and have the default autosave make it not revertable.


As for the second connection thing: the same user logged into a notebook set to autosave on two different machines interacts with the existing autosave in a destructive and nonfunctional way. Any edits made by the second connection may be immediately undone after they close the notebook if the first connection is still active. The first connection hasn't necessarily updated the state of the notebook from the changes made by the second, and it autosaves its (old) version of the content over all changes that were made, which is then unrecoverable as far as I know.


The long version:

I just lost a day's work, so I'm grumpy. It's mostly my fault due to bad practices (I should have just turned off autosave, I shouldn't leave notebooks open etc.), but if Fabric notebooks were checkpointed like AML notebooks literally already are, and/or if Fabric logged out existing connections from editing notebooks upon a second connnection from the same user it could have been avoided.


The situation: sometimes I work from home, sometimes I work from the office, each on a different machine. Unfortunately for me, sometimes I forget to close my browser window editing a Fabric notebook on one machine at the end of a workday, and continue working on the same notebook the next day on the other machine. I wrote a bunch of edits/new cells to the notebook at the office PC (Notebook Connection #2) March 4th, unaware that I had left the same notebook open on my home PC (Notebook Connection #1). Today, March 5th, I logged into my home PC, saw an open browser window with notebook connection 1 on my home PC, and found all my March 4th edits overwritten and unrecoverable.


To add insult to injury, I even thought I had commited all of my March 4th changes to a git repo at the end of the day from my office computer. However, in the 5 minute span between me closing Notebook Connection 2 on my office computer and then using the Fabric Workspace GUI to commit changes, it appears that Notebook Connection 1 autosaved and overwrote all of the days edits in the notebook, and the commit simply registered what was effectively the March 3rd version of the file. I didn't notice the discrepancy until digging into the repo today as the GUI commit doesn't show any details while it's being applied other than which files are being synced.


I don't know what scenario there could be where having multiple connections simultaneously connected autosaving over each other is a benefit, as all it does is immediately erase any edits. I know I shouldn't have put myself in a position where this could have happened, but it seems silly that it's even a possibility. It's a bit of a rare scenario, but with the number of hybrid workers these days I'm sure I'm not the only one who's ever run into it.

Status: New
Comments
Patrick_Dornian
New Member

Still thinking about this -- here's the Microsoft docs on AML notebook checkpointing.


https://learn.microsoft.com/en-us/azure/machine-learning/how-to-run-jupyter-notebooks?view=azureml-api-2#save-and-checkpoint-a-notebook


I don't see any reason why this wouldn't be implemented in Fabric notebooks as well, other than it wasn't in the MVP spec.

fbcideas_migusr
New Member

Databricks treats it very similar to how Google Docs or Google Sheets. There is a running auto-save with full restore at any point, including after you restore to an older version you can still restore to a point after that older version. This seems like a pretty important feature given the current state of flakiness in its save and multi-user access capabilities. Even if you aren't using proper version control, there should still be modern safety net like is common now for web-based apps.

fbcideas_migusr
New Member

(We also just had an instance of the notebook contents just disappearing ... which, like Patrick, occurred when more than one user had the notebook open for collaboration.)

fbcideas_migusr
New Member
Status changed to: New