Wednesday, January 30, 2019

Understanding and working with NULL in SQL Server

Graphic representation of the difference between 0 and NULL
Image taken from 9gag.com
According to database theory, a good RDBMS must implement a marker to indicate "Missing or inapplicable information".

SQL Server implements the lack of value with NULL, that is datatype independent and indicates missing value, so the logical validations are different, this is better known as Three-Valued Logic, where any predicate can evaluate to True, False or Unknown.

I see a common error, referring to null like "Null values" but the correct definition of null is "Lack of value" so you must refer to it as null, in singular form.

On this post, we will learn how to work with null in SQL Server.

Declaring and assigning NULL


For working with null, SQL Server engine uses the reserved word NULL to refer to it.
It is datatype independent so you just have to assign it to any variable or field, using the equal operator =, as you can see on this example:

DECLARE @NVi as int = NULL;
DECLARE @NVc as nvarchar(30) = NULL;
DECLARE @NVv as sql_variant = NULL;

SELECT @NVi, @NVc, @NVv;

If we run the statement, we will obtain these results, the same for each data type, as expected.



For inserting and updating fields with NULL we do it like on this example:


-- For inserting values
INSERT INTO test1..Table_1 (column_char,column2)
VALUES(NULL, NULL);

-- For updating values
UPDATE test1..Table_1
SET column2 = NULL;

Be careful when working with null, the equal operator = is only used for assignment.

Operations and comparison against NULL


As we stated earlier, any predicate or comparison can evaluate to TRUE, FALSE or UNKNOWN, so when a value is unknown we don't know if it is true or false, so comparing or working any value with unknown is also unknown.

For example, the following operations result is NULL in all cases:

--Arithmetic operations
SELECT NULL+5,NULL-3.47, NULL*3.1416, NULL/0; 

SELECT SQRT(NULL), POWER(NULL,2);

--String operations
SELECT 'HELLO ' +NULL + 'WORLD';

SELECT QUOTENAME(NULL);

SELECT LTRIM(NULL);

--Date operations

SELECT DATEADD(m,1,NULL);

SELECT DATEDIFF(m,GETDATE(),NULL);

When comparing to null, we also obtain null as well, as in those examples, as you can see, even comparing null to null is unknown, and when we execute below code, we obtain NO for all:


--comparing to 0
IF(0= NULL) OR (0 <> NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO' 

--Comparing to empty string ''
IF(''= NULL) OR (''<> NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO' 

--Even comparing to another null
IF(NULL= NULL) OR (NULL<> NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO' 

So, if we want to compare value or column and check if is null or not what must we do?
SQL Server implements the IS NULL and IS NOT  NULL to compare against null, usage is as follows:


-- IS NULL usage

SELECT *
FROM test1..Table_1
WHERE column2 IS NULL;


-- IS NOT NULL usage

SELECT *
FROM test1..Table_1
WHERE column_char IS NOT NULL;


-- Using on IF construct

DECLARE @NVi as int = NULL;

IF(@NVi IS NULL)
 SELECT 'YES'
ELSE
 SELECT 'NO';


-- For Replacing NULL you can use
-- ISNULL Value since SQL 2008

SELECT ISNULL(@NVi,0);


With these tools we are ready to work with null in our databases, so now you should follow some considerations to not impact your database performance.

Special considerations for good performance


As the last point, I would like to give you some tips for dealing with NULL

Prefer IS NULL over ISNULL()


When possible, try to compare predicates using IS NULL, before casting NULL to default values using ISNULL(), because casted values are not SARGABLE.

Take as an example of these two queries, they are equivalent, but the first one has better performance over the second:


-- First query uses an index seek :)
SELECT FirstName
      ,MiddleName
      ,LastName     
FROM Person.Person
WHERE MiddleName = N'' 
 OR MiddleName IS NULL;

-- Second query uses an index scan :(
SELECT FirstName
      ,MiddleName
      ,LastName     
FROM Person.Person
WHERE ISNULL(MiddleName,N'')=N'';

These are the execution plans:

First query execution plan, an index seek is used :)

Second query execution plan, an index scan is used :(

We get a warning on the second execution plan
You can see the 2 differences on the 2 plans, so for this case, we prefer to stick to the first one, even if you must write more code.

Be careful with aggregations over nonexistent data


When you perform aggregations, be extra careful with no existent data, even when columns do not allow null, aggregate data that does not exist on the table returns null, contrary to what one could think can be the usual (a 0 value), as you can see on this example:


-- Even when TotalDue field does not allow NULL, 
-- the SUM of noexistent values is NUll, not 0 as one could think

SELECT SUM(TotalDue) as [Total Due]
FROM Sales.SalesOrderHeader
WHERE DueDate > GETDATE();

And the query results:


For those cases, you should use the ISNULL() function after SUM.

As I always recommend: test anything before going live, and use default values and not null columns when possible, to make your life easier.


No comments:

Post a Comment