Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hi,
I encountered a strange issue. I read daily parquet files and save them into delta files, this processing is running for over a year without issue. Today, during saving of the delta file, I receive this error: "An error occurred while calling o5092.save. : com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 29)): has to be escaped using backslash to be included in string value". The data contains some strange characters but this always has been the case and did not cause any issues. I first thought the issue was with todays parquet file but if I tried reading older files (which used to work); I get the same error. I did not change anything in the code nor PySpark settings.
Does anybody has an idea what is the cause of this behaviour (I rather not filter out the problematic columns in the bronze layer). Thank you!
Solved! Go to Solution.
Hi @JVL2 ,
Thanks for reaching out to the Microsoft fabric community forum.
@burakkaragoz Thank you for your prompt response
Suggested Additions:
UDFs are convenient, but they can be slower than Spark’s native functions.
Add this note to help users scale better:
Performance Tip: If you're working with large datasets, consider replacing the UDF with Spark's built-in regexp_replace function. It's faster and leverages Catalyst optimizations.
Example (optional to include):
from pyspark.sql.functions import regexp_replace, col
for c, t in df.dtypes:
if t == 'string':
df = df.withColumn(c, regexp_replace(col(c), r'[\x00-\x1F\x7F]', ''))
Give users a way to identify where the problem exists before cleaning everything.
Add this snippet as an optional diagnostic:
from pyspark.sql.functions import col
# Identify rows with control characters
control_char_pattern = r'[\x00-\x1F\x7F]'
for col_name, dtype in df.dtypes:
if dtype == 'string':
count = df.filter(col(col_name).rlike(control_char_pattern)).count()
if count > 0:
print(f"Column '{col_name}' has {count} rows with control characters.")
Mention that even if Spark accepts bad characters now, downstream tools (like Power BI, APIs, consumers reading Delta/Parquet) might break.
It's a good practice to sanitize these characters at the bronze/silver layer to avoid silent corruption or downstream ingestion errors.
It helps to mention that users should log their Spark/Delta version, since future changes in JSON behavior could cause similar issues again.
Tip: Log your Spark and Delta versions (spark.version, delta.__version__) to help with future debugging if behaviors change again.
If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.
We appreciate your engagement and thank you for being an active part of the community.
Best Regards,
Lakshmi Narayana.
Thank you for the feedback! I will indeed add some logging of the used delta and pyspark versions and a good tip to avoid using UDF for performance reasons,I did not know that. Thanks!
Hi @JVL2 ,
Thanks for reaching out to the Microsoft fabric community forum.
@burakkaragoz Thank you for your prompt response
Suggested Additions:
UDFs are convenient, but they can be slower than Spark’s native functions.
Add this note to help users scale better:
Performance Tip: If you're working with large datasets, consider replacing the UDF with Spark's built-in regexp_replace function. It's faster and leverages Catalyst optimizations.
Example (optional to include):
from pyspark.sql.functions import regexp_replace, col
for c, t in df.dtypes:
if t == 'string':
df = df.withColumn(c, regexp_replace(col(c), r'[\x00-\x1F\x7F]', ''))
Give users a way to identify where the problem exists before cleaning everything.
Add this snippet as an optional diagnostic:
from pyspark.sql.functions import col
# Identify rows with control characters
control_char_pattern = r'[\x00-\x1F\x7F]'
for col_name, dtype in df.dtypes:
if dtype == 'string':
count = df.filter(col(col_name).rlike(control_char_pattern)).count()
if count > 0:
print(f"Column '{col_name}' has {count} rows with control characters.")
Mention that even if Spark accepts bad characters now, downstream tools (like Power BI, APIs, consumers reading Delta/Parquet) might break.
It's a good practice to sanitize these characters at the bronze/silver layer to avoid silent corruption or downstream ingestion errors.
It helps to mention that users should log their Spark/Delta version, since future changes in JSON behavior could cause similar issues again.
Tip: Log your Spark and Delta versions (spark.version, delta.__version__) to help with future debugging if behaviors change again.
If this post helped resolve your issue, please consider giving it Kudos and marking it as the Accepted Solution. This not only acknowledges the support provided but also helps other community members find relevant solutions more easily.
We appreciate your engagement and thank you for being an active part of the community.
Best Regards,
Lakshmi Narayana.
Hi @JVL ,
we ran into a similar issue recently. In our case, it was caused by control characters (like ASCII 29) sneaking into string fields. Even if they were there before, something might have changed in the underlying Spark/Delta version or how JSON serialization is handled.
Here’s what helped us:
1. Clean control characters before writing
We added a small UDF to strip out non-printable characters:
import re from pyspark.sql.functions import udf from pyspark.sql.types import StringType def clean_str(s): if s: return re.sub(r'[\\x00-\\x1F\\x7F]', '', s) return s clean_udf = udf(clean_str, StringType()) df_clean = df.select([clean_udf(c).alias(c) if t == 'string' else c for c, t in df.dtypes])
2. Check Spark/Delta version
If you recently updated your runtime (even silently), the JSON parser behavior might have changed. Worth checking.
3. Try writing with mode='append' or overwriteSchema
Sometimes schema evolution or write mode triggers weird serialization paths.
Let me know if you want a quick script to scan for control chars in your dataframe.
If my response resolved your query, kindly mark it as the Accepted Solution to assist others. Additionally, I would be grateful for a 'Kudos' if you found my response helpful.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
50 | |
28 | |
14 | |
14 | |
4 |
User | Count |
---|---|
65 | |
59 | |
23 | |
8 | |
7 |