Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Prepping for a Fabric certification exam? Join us for a live prep session with exam experts to learn how to pass the exam. Register now.

Reply
jeffshieldsdev
Solution Sage
Solution Sage

COUNT is 0 when nullable column in where clause?

I'm having trouble understanding this--if a column contains any nulls, do I have to use coalesce to perform a string comparison on the column?

select count(*)
from my_table
-- 292575

select count(*)
from my_table
where internal_id = external_id;
-- 0

select count(*)
from my_table
where internal_id <> external_id;
-- 0

select count(*)
from my_table
where coalesce(internal_id, '') = coalescse(external_id, '');
-- 101330

select count(*)
from my_table
where coalesce(internal_id, '') <> coalescse(external_id, '');
-- 191245

Runtime: 1.3 (Spark 3.5, Delta 3.2)

Thanks!

2 ACCEPTED SOLUTIONS
spencer_sa
Super User
Super User

There is a NULL-safe equality operator in SparkSQL   <=>
This properly handles NULLs and equality and can replace the need for coalesces.
https://spark.apache.org/docs/3.5.2/sql-ref-null-semantics.html

View solution in original post

jeffshieldsdev
Solution Sage
Solution Sage

A solution is not present in this thread, despite someone accepting a reply as a solution.

View solution in original post

10 REPLIES 10
prasbharat
Frequent Visitor

Yes, this is expected behavior. In SQL, NULL comparisons like = or <> evaluate to NULL (treated as FALSE), which is why your queries didn't behave as expected earlier. Using COALESCE to replace NULL with '' ensures comparisons work correctly. Alternatively, you can use IS NULL/IS NOT NULL or the NULL-safe equality operator (<=>) in Spark for precise handling without modifying data."

How can both of these statements return 0 rows?

 

select count(*)
from my_table
where internal_id = external_id;
-- 0

select count(*)
from my_table
where internal_id <> external_id;
-- 0

This behavior occurs because SQL comparisons involving NULL (= or <>) evaluate to NULL, which is treated as FALSE. If all rows in your table have NULL in either internal_id or external_id, neither condition (= or <>) will match, resulting in 0 rows for both queries.

 

You can confirm this by checking for NULL values using:

select count(*)
from my_table
where internal_id IS NULL
or external_id IS NULL


To include NULL values in your logic, you can either use COALESCE to replace NULL with a default value or use the NULL-safe equality operator (<=>) for comparisons.

 

Hope this helps ?

 

Regards,

Prasana

jeffshieldsdev
Solution Sage
Solution Sage

A solution is not present in this thread, despite someone accepting a reply as a solution.

Anonymous
Not applicable

HI @jeffshieldsdev,

Perhaps you can take a look at following link about use sql statement to check null values if help with your scenario:

t sql - How to check if a column is empty or null using SQL query select statement? - Stack Overflow

Regards,

Xiaoxin Sheng

I'm not struggling with null values, I wanted to confirm this strange behavior in Spark SQL in Fabric: one is equals, the other is doesn't equal--both are 0

select count(*)
from my_table
where internal_id = external_id;
-- 0

select count(*)
from my_table
where internal_id <> external_id;
-- 0

 

Anonymous
Not applicable

HI @jeffshieldsdev,

Have you tried to manually specify the records that has same value and use them to run with your sql queries?
Also, I think you need to check these records at first to confirm and apply upper/lower function on these column to avoid case sensitive, do trim/remove 'blank space' and special characters to help them mapping correctly.

Regards,

Xiaoxn Sheng

spencer_sa
Super User
Super User

There is a NULL-safe equality operator in SparkSQL   <=>
This properly handles NULLs and equality and can replace the need for coalesces.
https://spark.apache.org/docs/3.5.2/sql-ref-null-semantics.html

Thanks for the reply. That is producing the same as COALESCE()

I guess I'm surprised Spark handles nulls this way.

jeffshieldsdev
Solution Sage
Solution Sage

I recreated my table without null values, and now = and <> work as expected

drop table my_table;

create or replace table my_table
using delta
as
select coalesce(internal_id,'') as internal_id
    ,coalesce(external_id,'') as external_id
from source1 
union all
select coalesce(internal_id,'') as internal_id
    ,coalesce(external_id,'') as external_id
from source2
union all
select coalesce(internal_id,'') as internal_id
    ,coalesce(external_id,'') as external_id
from source3;
select count(*)
from my_table
where internal_id = external_id;
-- 101330

select count(*)
from my_table
where internal_id <> external_id;
-- 191245

Is this how Spark/Delta tables work and I was unaware? 

Helpful resources

Announcements
FBCApril_Carousel

Fabric Monthly Update - April 2025

Check out the April 2025 Fabric update to learn about new features.

Notebook Gallery Carousel1

NEW! Community Notebooks Gallery

Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.

April2025 Carousel

Fabric Community Update - April 2025

Find out what's new and trending in the Fabric community.