Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
jakemercer
New Member

PySpark Query on Fabric Fails with StreamConstraintsException: String Length Exceeds Maximum

I'm encountering an error when executing a PySpark query on Fabric. The error message is:

 

`com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20026295) exceeds the maximum length (20000000)`

 

I've attempted several approaches to resolve this issue, including:
- Configuring the string length limit (unsuccessfully)
- Reducing the query size and complexity
- Removing long text columns from load to avoid large string objects

 

However, none of these solutions have worked so far. 

 

Environment:
- Platform: Microsoft Fabric
- Library: PySpark
- Data Source: Lakehouse (Delta format)

 

Questions:
1. Is there a way to configure Fabric to allow larger string lengths in queries?
2. Could this error be related to serialization limits in PySpark, and if so, are there workarounds?
3. Has anyone else faced a similar issue on Fabric, and what solutions have worked for you?

 

Any insights or workarounds would be greatly appreciated!
1 ACCEPTED SOLUTION

Hi @girishthimmaiah 

 

Thank you for your answer. Microsoft Support sent me a very similar answer. Unfortunately the trimming of the text column or the junking of the data both did not lead to the error disappearing.

 

I got the notebook running by saving an intermediate result in a temporary table and loading it from there again. That's not so nice but for the moment I can live with it.

 

Thanks for your help.

View solution in original post

5 REPLIES 5
v-vpabbu
Community Support
Community Support

Hi @jakemercer,

 

Thanks @girishthimmaiah  for Addressing the issue.

 

we would like to follow up to see if the solution provided by the super user resolved your issue. Please let us know if you need any further assistance.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

 

Regards,
Vinay Pabbu

Hi @jakemercer,

 

May I ask if you have gotten this issue resolved?

If it is solved, please mark the helpful reply or share your solution and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster.

 

Regards,
Vinay Pabbu

girishthimmaiah
Resolver I
Resolver I

The error you're seeing (StreamConstraintsException: (String length exceeds maximum) happens because PySpark is trying to work with a string that is longer than Fabric's 20, 000, 000 character limit in a single column. This usually comes from big text columns, JSON, or poorly optimized queries.

Possible Solutions:

Truncate or Substring Large Text Fields

If you have large text fields in your data, try to limit the size of text fields before querying:

python

from pyspark.sql.functions import expr
df = df.withColumn("truncatedcolumn", expr("substring(longtextcolumn, 1, 5000)"))

Then, drop the original long column before running queries:


python
df = df.drop("longtextcolumn")


Enhance String Length Limit in PySpark (Partial Success).

Try increasing the max result size or adjusting spark.sql.broadcastTimeout:

python
spark. conf.set("spark.driver.maxResultSize", "4g")
spark. conf.set("spark.sql.broadcastTimeout", "600")

But note: Fabric might still enforce a hard limit.

Convert Data to a More Efficient Format

When working with JSON or large text files, think of:.

Converting the fields to Parquet/Delta format that store the fields in the native format of strings, rather than directly storing the raw strings.

Splitting text into multiple smaller chunks.

Work on chunks rather than one big query.

If your query retrieves a very large amount of data, split it up using filtering or pagination:

python

df = spark.read.format("delta").load("yourpath").filter("id BETWEEN 1 AND 1000")

 

Use UDFs to Handle Large Text Processing

Please use a UDF (user defined function) to split or condense the text before feeding it to Fabric.

Answers to Your Questions:

1 Is it possible to configure Fabric so strings can be much longer?
Not directly. There is a 20, 000, 000 character limit at the system level. It is better to shorten strings prior to querying.

2. Is this related to PySpark serialization limits?

Yes probably because of PySpark JSON serialization limits when transferring data between nodes. Optimizing your query and reducing large strings should help.

3 Has anyone else faced this issue?

Yes! A lot of Fabric users have seen this problem and solved it by shortening text, processing data in chunks, or rewriting queries.

Hi @girishthimmaiah 

 

Thank you for your answer. Microsoft Support sent me a very similar answer. Unfortunately the trimming of the text column or the junking of the data both did not lead to the error disappearing.

 

I got the notebook running by saving an intermediate result in a temporary table and loading it from there again. That's not so nice but for the moment I can live with it.

 

Thanks for your help.

Hi @jakemercer,

 

Thank you for sharing your update and confirming that you dont have any issue. i request you to please accept your own post as the solution, this will help other community members who might face a similar issue.

Thanks again for your contribution!

 

Regards,
Vinay Pabbu

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.