Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hi Community,
I’ve been working on a data pipeline where I extract Parquet files from an S3 shortcut connection into a PySpark notebook and transform them into Delta tables. The pipeline processes 12 Parquet files, each under 1.5GB in size, and everything was running smoothly (around 8 minutes for the entire process).
However, I recently encountered an issue after one of the Parquet files grew to 2.2GB. The process now crashes, even when I try processing this particular file separately as a test. I’m wondering whether Microsoft Fabric has a size limit for Parquet files or if this could be a timeout issue.
Interestingly, the problematic Parquet file can still be loaded into a Delta table manually using the "Load To Tables" option in the Amazon S3 shortcut, so the file itself doesn’t appear to be corrupted.
The error I am getting is the following:
Py4JJavaError: An error occurred while calling o5084.parquet. : Operation failed: "Internal Server Error", 500, HEAD, "path"/Order_Line_Item.parquet?upn=false&action=getStatus&timeout=90
When I click the link included in the error message, I see this additional error:
{"error":{"code":"Unauthorized","message":"Authentication Failed with Bearer token is not present in the request"}}
I really appreciate any help here .
Here is the code:
Solved! Go to Solution.
Hi @zunigaw , Thank you for reaching out to the Microsoft Community Forum.
This may not because of a size limit, but due to how Spark handles execution and authentication. Microsoft Fabric uses time-limited tokens for accessing shortcut-linked storage (like S3) and Spark delays file access until an action is triggered. If this delay exceeds the token's validity, the job fails with an authentication error, which is exactly what you're seeing.
In notebooks, that token is scoped to the session and doesn’t auto-refresh once expired. You should force Spark to access the file immediately after reading, while the token is still valid. You can do this by caching and counting the DataFrame before any further transformation or by writing a python code snippet to ensure the file is accessed while the token is still valid, avoiding deferred failures.
If this helped solve the issue, please consider marking it “Accept as Solution” and giving a ‘Kudos’ so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.
Hi @zunigaw , Thank you for reaching out to the Microsoft Community Forum.
This may not because of a size limit, but due to how Spark handles execution and authentication. Microsoft Fabric uses time-limited tokens for accessing shortcut-linked storage (like S3) and Spark delays file access until an action is triggered. If this delay exceeds the token's validity, the job fails with an authentication error, which is exactly what you're seeing.
In notebooks, that token is scoped to the session and doesn’t auto-refresh once expired. You should force Spark to access the file immediately after reading, while the token is still valid. You can do this by caching and counting the DataFrame before any further transformation or by writing a python code snippet to ensure the file is accessed while the token is still valid, avoiding deferred failures.
If this helped solve the issue, please consider marking it “Accept as Solution” and giving a ‘Kudos’ so others with similar queries may find it more easily. If not, please share the details, always happy to help.
Thank you.
User | Count |
---|---|
13 | |
4 | |
3 | |
3 | |
3 |
User | Count |
---|---|
8 | |
8 | |
7 | |
6 | |
5 |