Solved: PySpark Query on Fabric Fails with StreamConstrain...

jakemercer · ‎02-18-2025

I'm encountering an error when executing a PySpark query on Fabric. The error message is:

`com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20026295) exceeds the maximum length (20000000)`

I've attempted several approaches to resolve this issue, including:

- Configuring the string length limit (unsuccessfully)

- Reducing the query size and complexity

- Removing long text columns from load to avoid large string objects

However, none of these solutions have worked so far.

Environment:

- Platform: Microsoft Fabric

- Library: PySpark

- Data Source: Lakehouse (Delta format)

Questions:

1. Is there a way to configure Fabric to allow larger string lengths in queries?

2. Could this error be related to serialization limits in PySpark, and if so, are there workarounds?

3. Has anyone else faced a similar issue on Fabric, and what solutions have worked for you?

Any insights or workarounds would be greatly appreciated!

jakemercer · ‎03-02-2025

Hi @girishthimmaiah

Thank you for your answer. Microsoft Support sent me a very similar answer. Unfortunately the trimming of the text column or the junking of the data both did not lead to the error disappearing.

I got the notebook running by saving an intermediate result in a temporary table and loading it from there again. That's not so nice but for the moment I can live with it.

Thanks for your help.

View solution in original post

v-vpabbu · ‎02-24-2025

Hi @jakemercer,

Thanks @girishthimmaiah for Addressing the issue.

we would like to follow up to see if the solution provided by the super user resolved your issue. Please let us know if you need any further assistance.
If our super user response resolved your issue, please mark it as "Accept as solution" and click "Yes" if you found it helpful.

Regards,
Vinay Pabbu

v-vpabbu · ‎02-28-2025

Hi @jakemercer,

May I ask if you have gotten this issue resolved?

If it is solved, please mark the helpful reply or share your solution and accept it as solution, it will be helpful for other members of the community who have similar problems as yours to solve it faster.

Regards,
Vinay Pabbu

girishthimmaiah · ‎02-19-2025

The error you're seeing (StreamConstraintsException: (String length exceeds maximum) happens because PySpark is trying to work with a string that is longer than Fabric's 20, 000, 000 character limit in a single column. This usually comes from big text columns, JSON, or poorly optimized queries.

Possible Solutions:

Truncate or Substring Large Text Fields

If you have large text fields in your data, try to limit the size of text fields before querying:

python

from pyspark.sql.functions import expr
df = df.withColumn("truncatedcolumn", expr("substring(longtextcolumn, 1, 5000)"))

Then, drop the original long column before running queries:

python
df = df.drop("longtextcolumn")

Enhance String Length Limit in PySpark (Partial Success).

Try increasing the max result size or adjusting spark.sql.broadcastTimeout:

python
spark. conf.set("spark.driver.maxResultSize", "4g")
spark. conf.set("spark.sql.broadcastTimeout", "600")

But note: Fabric might still enforce a hard limit.

Convert Data to a More Efficient Format

When working with JSON or large text files, think of:.

Converting the fields to Parquet/Delta format that store the fields in the native format of strings, rather than directly storing the raw strings.

Splitting text into multiple smaller chunks.

Work on chunks rather than one big query.

If your query retrieves a very large amount of data, split it up using filtering or pagination:

python

df = spark.read.format("delta").load("yourpath").filter("id BETWEEN 1 AND 1000")

Use UDFs to Handle Large Text Processing

Please use a UDF (user defined function) to split or condense the text before feeding it to Fabric.

Answers to Your Questions:

1 Is it possible to configure Fabric so strings can be much longer?
Not directly. There is a 20, 000, 000 character limit at the system level. It is better to shorten strings prior to querying.

2. Is this related to PySpark serialization limits?

Yes probably because of PySpark JSON serialization limits when transferring data between nodes. Optimizing your query and reducing large strings should help.

3 Has anyone else faced this issue?

Yes! A lot of Fabric users have seen this problem and solved it by shortening text, processing data in chunks, or rewriting queries.

jakemercer · ‎03-02-2025

Hi @girishthimmaiah

Thank you for your answer. Microsoft Support sent me a very similar answer. Unfortunately the trimming of the text column or the junking of the data both did not lead to the error disappearing.

I got the notebook running by saving an intermediate result in a temporary table and loading it from there again. That's not so nice but for the moment I can live with it.

Thanks for your help.

v-vpabbu · ‎03-04-2025

Hi @jakemercer,

Thank you for sharing your update and confirming that you dont have any issue. i request you to please accept your own post as the solution, this will help other community members who might face a similar issue.

Thanks again for your contribution!

Regards,
Vinay Pabbu

PySpark Query on Fabric Fails with StreamConstraintsException: String Length Exceeds Maximum

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

PySpark Query on Fabric Fails with StreamConstraintsException: String Length Exceeds Maximum

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025