Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount.
Register nowGet certified as a Fabric Data Engineer: Check your eligibility for a 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700. Get started
I'm reading JSON files from a S3 shorcut.
It's 78112 small files, total size is 313MB. Largest file size is 20kb and smallest is 2kb.
The Spark Job shows completed (after 9m16s), but the cell is still in a Running state and never finishes (waited over 60 minutes)
This is the code in the cell:
df_raw = spark.read.option("multiline", "true").json("Files/source/S3/text/2024-01-20/*.json")
display(df_raw.printSchema())
df_raw.show(10, False)
The "Code Snippets" under the Job->Stage says: "Listing leaf files and directories for 78115 paths:<br/>abfss://<GUID>@onelake.dfs.fabric.microsoft.com/<GUID>/Files/source/S3/text/2024-01-20/<filename>.json, ...
The Log shows the same ERROR being generated every few miliseconds:
ERROR AzureBlobFileSystemStore [Thread-54]: getUnixTimeInMillisFromVersion has Exceptionjava.lang.NumberFormatException
The timestamps on these error messages are delayed it seems by a growing window of 10-30 minutes.
Spark cluster config is 3 nodes, Runtime 1.2 (Spark 3.4, Delta 2.4) Compute Small, Memory Optimized.
All spark logs show as completed succesfully apart from the error output above.
Anyone else experiencing this and what the resolution was?
Hi @dphugo - have they fixed it?
I am getting exactly the same error with shortcuts and parquet files. I wonder if there is any fix for that?
Hi @mkulikowski, unfortunately not.
There were a few email exchanges between myself and the support team, where I sent them logs and metadata over the period of a week.
They did then reach out to meet to go through the issue, but by then we deemed the approach infeasible and went a different route.
Hi @dphugo ,
Thanks for using Fabric Community.
Apologies for the issue you have been facing. I would like to check are you still facing this issue?
It's difficult to tell what could be the reason for this performance. I would request you to wait for sometime and try again.
If the issue still persists, please reach out to our support team so they can do a more thorough investigation on why this it is happening: Link
After creating a Support ticket please provide the ticket number as it would help us to track for more information.
Hope this helps. Please let us know if you have any other queries.
Hi @Anonymous,
I'm still experiencing the issue even after creating a new workspace, notebook, lakehouse and s3 shortcut.
I've logged a ticked:
Support request number: | 2402050050000341 |
Hi @dphugo ,
Thanks for sharing the support ticket number.
Support Team will reach out to you and will try to resolve the issue.
Please continue using Fabric Community for your further queries.