Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

Reply
4iurchenko
Resolver I
Resolver I

How to Make Spark Case Insensitive in Fabric?

Dear Colleagues,

 

If somebody faced that, is there a way to make Fabric's Spark - case Insensitive, in Spark SQL part at least?

Challenge:

Migration from SQL Server to Fabric. A lot of manual work with this UPPER() or LOWER() in joins and filters and selects.

 

Checked solutions:

1) Manually adding UPPER() or LOWER() everywhere - painful, difficult, but yes, solves.

2) I tried to play with spark configuration. Some people say it works, but for me it doesn't. Am I missing something?

 

Any other ideas are very appreciated how that challenge may be overcome.

1 ACCEPTED SOLUTION
MJParikh
Super User
Super User

Hi  @4iurchenko

In my opinopn you are not missing anything. Spark behaves differently from SQL Server in this area.

In SQL Server, string comparison often follows a case-insensitive collation such as CI_AS. Because of that, joins and filters treat ABC and abc as the same value.

Spark SQL, including Spark in Microsoft Fabric, performs case-sensitive string comparisons by default and there is no global collation setting that makes comparisons case-insensitive across queries. The configuration people reference (spark.sql.caseSensitive) controls column name resolution, not string comparison in joins or filters. So changing that setting will not change how values like ABC and abc compare.

For migrations from SQL Server, teams usually handle this in one of these ways.

Normalize values during ingestion or transformation.
Convert keys to a consistent case once and reuse them.

Example:

SELECT
UPPER(CustomerEmail) AS CustomerEmail,
UPPER(AccountId) AS AccountId
FROM SourceTable

Store the normalized values and join on them later.SELECT * FROM Orders o JOIN Customers c ON o.CustomerEmail = c.CustomerEmailNormalize inside the join when rewriting queries.

SELECT *
FROM A
JOIN B
ON UPPER(A.Email) = UPPER(B.Email)

This works but becomes expensive for large tables because functions execute during the join.

Create normalized helper columns.
Many teams add columns such as Email_Normalized = UPPER(Email) and use those for joins and filters.

Use ILIKE for case-insensitive filters.

SELECT *
FROM Customers
WHERE Name ILIKE 'john%'

This helps with filtering but not joins.

In Fabric Lakehouse environments, a common approach is to normalize keys once in the transformation layer (often the Silver layer in a medallion design). That keeps downstream queries clean and avoids repeated UPPER() or LOWER() calls.


Thank you!
Proud to be a Super User!
📩 Need more help?
✔️ Don’t forget to Accept as Solution if this guidance worked for you.
💛 Your Like motivates me to keep helping

View solution in original post

1 REPLY 1
MJParikh
Super User
Super User

Hi  @4iurchenko

In my opinopn you are not missing anything. Spark behaves differently from SQL Server in this area.

In SQL Server, string comparison often follows a case-insensitive collation such as CI_AS. Because of that, joins and filters treat ABC and abc as the same value.

Spark SQL, including Spark in Microsoft Fabric, performs case-sensitive string comparisons by default and there is no global collation setting that makes comparisons case-insensitive across queries. The configuration people reference (spark.sql.caseSensitive) controls column name resolution, not string comparison in joins or filters. So changing that setting will not change how values like ABC and abc compare.

For migrations from SQL Server, teams usually handle this in one of these ways.

Normalize values during ingestion or transformation.
Convert keys to a consistent case once and reuse them.

Example:

SELECT
UPPER(CustomerEmail) AS CustomerEmail,
UPPER(AccountId) AS AccountId
FROM SourceTable

Store the normalized values and join on them later.SELECT * FROM Orders o JOIN Customers c ON o.CustomerEmail = c.CustomerEmailNormalize inside the join when rewriting queries.

SELECT *
FROM A
JOIN B
ON UPPER(A.Email) = UPPER(B.Email)

This works but becomes expensive for large tables because functions execute during the join.

Create normalized helper columns.
Many teams add columns such as Email_Normalized = UPPER(Email) and use those for joins and filters.

Use ILIKE for case-insensitive filters.

SELECT *
FROM Customers
WHERE Name ILIKE 'john%'

This helps with filtering but not joins.

In Fabric Lakehouse environments, a common approach is to normalize keys once in the transformation layer (often the Silver layer in a medallion design). That keeps downstream queries clean and avoids repeated UPPER() or LOWER() calls.


Thank you!
Proud to be a Super User!
📩 Need more help?
✔️ Don’t forget to Accept as Solution if this guidance worked for you.
💛 Your Like motivates me to keep helping

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2026

Check out the April 2026 Fabric update to learn about new features.

Fabric SQL PBI Data Days

Data Days 2026 coming soon!

Sign up to receive a private message when registration opens and key events begin.

New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.