Solved: COUNT is 0 when nullable column in where clause?

jeffshieldsdev · ‎11-07-2024

I'm having trouble understanding this--if a column contains any nulls, do I have to use coalesce to perform a string comparison on the column?

select count(*)
from my_table
-- 292575

select count(*)
from my_table
where internal_id = external_id;
-- 0

select count(*)
from my_table
where internal_id <> external_id;
-- 0

select count(*)
from my_table
where coalesce(internal_id, '') = coalescse(external_id, '');
-- 101330

select count(*)
from my_table
where coalesce(internal_id, '') <> coalescse(external_id, '');
-- 191245

Runtime: 1.3 (Spark 3.5, Delta 3.2)

Thanks!

spencer_sa · ‎11-07-2024

There is a NULL-safe equality operator in SparkSQL <=>
This properly handles NULLs and equality and can replace the need for coalesces.
https://spark.apache.org/docs/3.5.2/sql-ref-null-semantics.html

View solution in original post

jeffshieldsdev · ‎11-21-2024

A solution is not present in this thread, despite someone accepting a reply as a solution.

View solution in original post

prasbharat · ‎11-23-2024

Yes, this is expected behavior. In SQL, NULL comparisons like = or <> evaluate to NULL (treated as FALSE), which is why your queries didn't behave as expected earlier. Using COALESCE to replace NULL with '' ensures comparisons work correctly. Alternatively, you can use IS NULL/IS NOT NULL or the NULL-safe equality operator (<=>) in Spark for precise handling without modifying data."

jeffshieldsdev · ‎11-23-2024

How can both of these statements return 0 rows?

select count(*)
from my_table
where internal_id = external_id;
-- 0

select count(*)
from my_table
where internal_id <> external_id;
-- 0

prasbharat · ‎11-23-2024

This behavior occurs because SQL comparisons involving NULL (= or <>) evaluate to NULL, which is treated as FALSE. If all rows in your table have NULL in either internal_id or external_id, neither condition (= or <>) will match, resulting in 0 rows for both queries.

You can confirm this by checking for NULL values using:

select count(*)
from my_table
where internal_id IS NULL
or external_id IS NULL

To include NULL values in your logic, you can either use COALESCE to replace NULL with a default value or use the NULL-safe equality operator (<=>) for comparisons.

Hope this helps ?

Regards,

Prasana

jeffshieldsdev · ‎11-21-2024

A solution is not present in this thread, despite someone accepting a reply as a solution.

Anonymous · ‎11-07-2024

HI @jeffshieldsdev,

Perhaps you can take a look at following link about use sql statement to check null values if help with your scenario:

t sql - How to check if a column is empty or null using SQL query select statement? - Stack Overflow

Regards,

Xiaoxin Sheng

jeffshieldsdev · ‎11-10-2024

I'm not struggling with null values, I wanted to confirm this strange behavior in Spark SQL in Fabric: one is equals, the other is doesn't equal--both are 0

select count(*)
from my_table
where internal_id = external_id;
-- 0

select count(*)
from my_table
where internal_id <> external_id;
-- 0

Anonymous · ‎11-11-2024

HI @jeffshieldsdev,

Have you tried to manually specify the records that has same value and use them to run with your sql queries?
Also, I think you need to check these records at first to confirm and apply upper/lower function on these column to avoid case sensitive, do trim/remove 'blank space' and special characters to help them mapping correctly.

Regards,

Xiaoxn Sheng

spencer_sa · ‎11-07-2024

There is a NULL-safe equality operator in SparkSQL <=>
This properly handles NULLs and equality and can replace the need for coalesces.
https://spark.apache.org/docs/3.5.2/sql-ref-null-semantics.html

jeffshieldsdev · ‎11-07-2024

Thanks for the reply. That is producing the same as COALESCE()

I guess I'm surprised Spark handles nulls this way.

jeffshieldsdev · ‎11-07-2024

I recreated my table without null values, and now = and <> work as expected

drop table my_table;

create or replace table my_table
using delta
as
select coalesce(internal_id,'') as internal_id
    ,coalesce(external_id,'') as external_id
from source1 
union all
select coalesce(internal_id,'') as internal_id
    ,coalesce(external_id,'') as external_id
from source2
union all
select coalesce(internal_id,'') as internal_id
    ,coalesce(external_id,'') as external_id
from source3;

select count(*)
from my_table
where internal_id = external_id;
-- 101330

select count(*)
from my_table
where internal_id <> external_id;
-- 191245

Is this how Spark/Delta tables work and I was unaware?

COUNT is 0 when nullable column in where clause?

Helpful resources

Fabric Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

COUNT is 0 when nullable column in where clause?

Helpful resources

Fabric Monthly Update - September 2025

FabCon Atlanta 2026