Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers!
Enter the sweepstakes now!Prepping for a Fabric certification exam? Join us for a live prep session with exam experts to learn how to pass the exam. Register now.
I'm having trouble understanding this--if a column contains any nulls, do I have to use coalesce to perform a string comparison on the column?
select count(*)
from my_table
-- 292575
select count(*)
from my_table
where internal_id = external_id;
-- 0
select count(*)
from my_table
where internal_id <> external_id;
-- 0
select count(*)
from my_table
where coalesce(internal_id, '') = coalescse(external_id, '');
-- 101330
select count(*)
from my_table
where coalesce(internal_id, '') <> coalescse(external_id, '');
-- 191245
Runtime: 1.3 (Spark 3.5, Delta 3.2)
Thanks!
Solved! Go to Solution.
There is a NULL-safe equality operator in SparkSQL <=>
This properly handles NULLs and equality and can replace the need for coalesces.
https://spark.apache.org/docs/3.5.2/sql-ref-null-semantics.html
A solution is not present in this thread, despite someone accepting a reply as a solution.
Yes, this is expected behavior. In SQL, NULL comparisons like = or <> evaluate to NULL (treated as FALSE), which is why your queries didn't behave as expected earlier. Using COALESCE to replace NULL with '' ensures comparisons work correctly. Alternatively, you can use IS NULL/IS NOT NULL or the NULL-safe equality operator (<=>) in Spark for precise handling without modifying data."
How can both of these statements return 0 rows?
select count(*) from my_table where internal_id = external_id; -- 0 select count(*) from my_table where internal_id <> external_id; -- 0
This behavior occurs because SQL comparisons involving NULL (= or <>) evaluate to NULL, which is treated as FALSE. If all rows in your table have NULL in either internal_id or external_id, neither condition (= or <>) will match, resulting in 0 rows for both queries.
You can confirm this by checking for NULL values using:
select count(*)
from my_table
where internal_id IS NULL
or external_id IS NULL
To include NULL values in your logic, you can either use COALESCE to replace NULL with a default value or use the NULL-safe equality operator (<=>) for comparisons.
Hope this helps ?
Regards,
Prasana
A solution is not present in this thread, despite someone accepting a reply as a solution.
HI @jeffshieldsdev,
Perhaps you can take a look at following link about use sql statement to check null values if help with your scenario:
t sql - How to check if a column is empty or null using SQL query select statement? - Stack Overflow
Regards,
Xiaoxin Sheng
I'm not struggling with null values, I wanted to confirm this strange behavior in Spark SQL in Fabric: one is equals, the other is doesn't equal--both are 0
select count(*)
from my_table
where internal_id = external_id;
-- 0
select count(*)
from my_table
where internal_id <> external_id;
-- 0
HI @jeffshieldsdev,
Have you tried to manually specify the records that has same value and use them to run with your sql queries?
Also, I think you need to check these records at first to confirm and apply upper/lower function on these column to avoid case sensitive, do trim/remove 'blank space' and special characters to help them mapping correctly.
Regards,
Xiaoxn Sheng
There is a NULL-safe equality operator in SparkSQL <=>
This properly handles NULLs and equality and can replace the need for coalesces.
https://spark.apache.org/docs/3.5.2/sql-ref-null-semantics.html
Thanks for the reply. That is producing the same as COALESCE()
I guess I'm surprised Spark handles nulls this way.
I recreated my table without null values, and now = and <> work as expected
drop table my_table;
create or replace table my_table
using delta
as
select coalesce(internal_id,'') as internal_id
,coalesce(external_id,'') as external_id
from source1
union all
select coalesce(internal_id,'') as internal_id
,coalesce(external_id,'') as external_id
from source2
union all
select coalesce(internal_id,'') as internal_id
,coalesce(external_id,'') as external_id
from source3;
select count(*)
from my_table
where internal_id = external_id;
-- 101330
select count(*)
from my_table
where internal_id <> external_id;
-- 191245
Is this how Spark/Delta tables work and I was unaware?
Check out the April 2025 Fabric update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
User | Count |
---|---|
9 | |
4 | |
3 | |
2 | |
2 |
User | Count |
---|---|
9 | |
4 | |
4 | |
3 | |
3 |