Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Can someone help me understand why LTRIM('0011160027', '0') returns an empty string when using PySpark? It does not behave this way when querying from a lakehouse sql endpoint.
Solved! Go to Solution.
Hi @smoqt ,
Thank you for reaching out with your query about the LTRIM behavior in Microsoft Fabric.
As per my understanding, lakehouse SQL endpoint uses T-SQL’s LTRIM(string, characters), correctly trimming leading '0' to return '11160027'. Whereas Standard PySpark’s ltrim(str) only removes whitespace, not specific characters, which caused the empty string. However, Fabric extends ltrim to support ltrim(trimstr, str).
In Fabric, ltrim('0', '0011160027') works due to a Fabric-specific extension, trimming leading '0' to return '11160027'.
SQL Endpoint LTRIM('0', '0011160027') ,returns '0' because T-SQL’s LTRIM treats '0' as the string to trim, with no leading '0011160027' to remove.
Differences between T-SQL and PySpark can be tricky. Use regexp_replace in PySpark for portability:
from pyspark.sql.functions import regexp_replace
df = df.withColumn('trimmed', regexp_replace('NumberStr', '^0+', ''))
If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.
Thank you
For anyone else that was confused by this, the simple answer is that (at the time), the Fabric runtimes use spark 3.3 or 3.4, and the trimchars argument was not added until spark 4.0.
Keep that in mind if you have the spark/docs/latest url bookmarked, like I do.
Hi @smoqt ,
Thank you for reaching out with your query about the LTRIM behavior in Microsoft Fabric.
As per my understanding, lakehouse SQL endpoint uses T-SQL’s LTRIM(string, characters), correctly trimming leading '0' to return '11160027'. Whereas Standard PySpark’s ltrim(str) only removes whitespace, not specific characters, which caused the empty string. However, Fabric extends ltrim to support ltrim(trimstr, str).
In Fabric, ltrim('0', '0011160027') works due to a Fabric-specific extension, trimming leading '0' to return '11160027'.
SQL Endpoint LTRIM('0', '0011160027') ,returns '0' because T-SQL’s LTRIM treats '0' as the string to trim, with no leading '0011160027' to remove.
Differences between T-SQL and PySpark can be tricky. Use regexp_replace in PySpark for portability:
from pyspark.sql.functions import regexp_replace
df = df.withColumn('trimmed', regexp_replace('NumberStr', '^0+', ''))
If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.
Thank you
Very tricky if someone is copying SQL from SSMS environment or Lakehouse endpoint into a Notebook.
Tried ltrim( [trimstr ,] str) and that had the intended result. The apache spark doc does not specify unless I am missing something.
Experience the highlights from FabCon & SQLCon, available live and on-demand starting April 14th.
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 8 | |
| 4 | |
| 3 | |
| 3 | |
| 3 |