Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more

Reply
dbeavon3
Memorable Member
Memorable Member

File directory not synchronized - synfs after mssparkutils.fs.mount

I had been struggling for about 10 minutes to read a file from the lakehouse.

I mounted like so:


 
fspath = mssparkutils.fs.mount(my_files_path, "/LakehouseFiles")
 
dbeavon3_0-1759500304030.png

 

 

 

... then I added new directories and a new file in to the lakehouse "Files".

 The file was called "Partition.xml".

 

 

Then I tried to read the Partition.xml file (below) and it kept failing:

 

dbeavon3_1-1759500401402.png

 

 


FileNotFoundError: [Errno 2] No such file or directory: '/synfs/notebook/xxxxxxx-cf8e-45a8-9097-15f0db4f3883/LakehouseFiles/InventoryManagement/InventoryAgings/Partition.xml'

 

Finally after about ten minutes it worked.  What a nightmare.... Can someone tell me if this is expected behavior for the "synfs" mounts?  This crappy behavior sounds like the par for the course, based on my other experiences on this platform, but ten minutes is way too long.  Is there anyway to force the mssparkutils.fs mounts to behave better?  Eg. should I synch on demand?

 

 

 

 

1 ACCEPTED SOLUTION
tayloramy
Super User
Super User

Hi @dbeavon3,

 

You’re not imagining it - the /synfs path you get back from mssparkutils.fs.getMountPath(...) is an eventually-consistent projection of OneLake into the notebook’s file system. When you add or rename files outside the running Spark session (e.g., via the Lakehouse UI, another job, or upload), it can take a while before the mount’s directory listing catches up. That’s why your open("/synfs/notebook/.../Partition.xml") failed for several minutes and then “magically” started working.

 

Two workarounds:

  1. Refresh the mounts before you read
    import mssparkutils
    mssparkutils.fs.refreshMounts()
    This forces the workspace/job mount metadata to resync so new files show up faster. See: Microsoft Spark Utilities docs and the refreshMounts reference notes here: Synapse mount API (same APIs).
  2. (Better) Skip /synfs for reads and use an ABFSS path
    Read the file directly from OneLake instead of the mounted filesystem. With Python, use fsspec so you can call open() on ABFSS:
    import fsspec
    
    abfss_path = "abfss://<workspaceId>@onelake.dfs.fabric.microsoft.com/<LakehouseName>/Files/InventoryManagement/InventoryAgings/Partition.xml"
    with fsspec.open(abfss_path, "rb") as f:
        full_doc = f.read()
    This avoids the /synfs cache entirely. Nice walkthrough: Using fsspec with OneLake.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Join the Fabric Discord!

Proud to be a Super User!





View solution in original post

2 REPLIES 2
tayloramy
Super User
Super User

Hi @dbeavon3,

 

You’re not imagining it - the /synfs path you get back from mssparkutils.fs.getMountPath(...) is an eventually-consistent projection of OneLake into the notebook’s file system. When you add or rename files outside the running Spark session (e.g., via the Lakehouse UI, another job, or upload), it can take a while before the mount’s directory listing catches up. That’s why your open("/synfs/notebook/.../Partition.xml") failed for several minutes and then “magically” started working.

 

Two workarounds:

  1. Refresh the mounts before you read
    import mssparkutils
    mssparkutils.fs.refreshMounts()
    This forces the workspace/job mount metadata to resync so new files show up faster. See: Microsoft Spark Utilities docs and the refreshMounts reference notes here: Synapse mount API (same APIs).
  2. (Better) Skip /synfs for reads and use an ABFSS path
    Read the file directly from OneLake instead of the mounted filesystem. With Python, use fsspec so you can call open() on ABFSS:
    import fsspec
    
    abfss_path = "abfss://<workspaceId>@onelake.dfs.fabric.microsoft.com/<LakehouseName>/Files/InventoryManagement/InventoryAgings/Partition.xml"
    with fsspec.open(abfss_path, "rb") as f:
        full_doc = f.read()
    This avoids the /synfs cache entirely. Nice walkthrough: Using fsspec with OneLake.

If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.





If you found this helpful, consider giving some Kudos.
If I answered your question or solved your problem, mark this post as the solution!

Join the Fabric Discord!

Proud to be a Super User!





Thanks.  I wanted to use more conventional python libraries (os).  Sometimes that makes it easier for AI assistance, and for migrating solutions to other platforms as needed (or for simply copy/pasting sample pyspark scripts to other standard python solutions).

 

I like the tip about refreshMounts.  I will do that more frequently.  Only about 1% of my operations read directly from files.  Most of the file operations are for reading/writing blob data from abfss via pyspark.

 

Helpful resources

Announcements
Fabric Data Days is here Carousel

Fabric Data Days 2026

Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.