Null (SQL)

History

Null propagation

Comparisons with NULL and the three-valued logic (3VL)

Challenges

Arithmetic operations

String concatenation

Effect of Unknown in WHERE clauses

Null-specific and 3VL-specific comparison predicates

In the SQL database query language, ' or ' is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfill the requirement that all true relational database management systems support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.
A null should not be confused with a value of 0. A null indicates a lack of a value, which is not the same as a zero value. For example, consider the question "How many books does Adam own?" The answer may be "zero" or "null". In a database table, the column reporting this answer would start with no value, and it would not be updated with the value zero until it is ascertained that Adam owns no books.
In SQL, null is a marker, not a value. This usage differs from most programming languages, where a null value of a reference means it is not pointing to any object.

E. F. Codd mentioned nulls as a method of representing missing data in the relational model in a 1975 paper in the FDT Bulletin of ACM-SIGMOD. Codd's paper that is most commonly cited with the semantics of Null is his 1979 paper in the ACM Transactions on Database Systems, in which he also introduced his Relational Model/Tasmania, although much of the other proposals from the latter paper have remained obscure. Section 2.3 of his 1979 paper details the semantics of Null propagation in arithmetic operations as well as comparisons employing a ternary logic when comparing to nulls; it also details the treatment of Nulls on other set operations. In database theory circles, the original proposal of Codd is now referred to as "Codd tables". Codd later reinforced his requirement that all RDBMSs support Null to indicate missing data in a 1985 two-part article published in Computerworld magazine.
The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in IBM System R. Although Don Chamberlin recognized nulls as one of the most controversial features of SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form of system support for missing information, saving the programmer from many duplicative application-level checks while at the same time providing the database designer with the option not to use Nulls if they so desire; for example, to avoid well-known anomalies. Chamberlin also argued that besides providing some missing-value functionality, practical experience with Nulls also led to other language features that rely on Nulls, like certain grouping constructs and outer joins. Finally, he argued that in practice Nulls also end up being used as a quick way to patch an existing schema when it needs to evolve beyond its original intent, coding not for missing but rather for inapplicable information; for example, a database that quickly needs to support electric cars while having a miles-per-gallon column.
Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' domain. It remains an active field of research though, with numerous papers still being published.

Null has been the focus of controversy and a source of debate because of its associated three-valued logic, special requirements for its use in SQL joins, and the special handling required by aggregate functions and SQL grouping operators. Computer science professor Ron van der Meyden summarized the various issues as: "The inconsistencies in the SQL standard mean that it is not possible to ascribe any intuitive logical semantics to the treatment of nulls in SQL." Although various proposals have been made for resolving these issues, the complexity of the alternatives has prevented their widespread adoption.

Because Null is not a data value, but a marker for an absent value, using mathematical operators on Null gives an unknown result, which is represented by Null. In the following example, multiplying 10 by Null results in Null:

10 * NULL -- Result is NULL

This can lead to unanticipated results. For instance, when an attempt is made to divide Null by zero, platforms may return Null instead of throwing an expected "data exception division by zero". Though this behavior is not defined by the ISO SQL standard many DBMS vendors treat this operation similarly. For instance, the Oracle, PostgreSQL, MySQL Server, and Microsoft SQL Server platforms all return a Null result for the following:

NULL / 0

String concatenation operations, which are common in SQL, also result in Null when one of the operands is Null. The following example demonstrates the Null result returned by using Null with the SQL || string concatenation operator.

'Fish ' || NULL || 'Chips' -- Result is NULL

This is not true for all database implementations. In an Oracle RDBMS, for example, NULL and the empty string are considered the same thing and therefore 'Fish ' || NULL || 'Chips' results in 'Fish Chips'.

Since Null is not a member of any data domain, it is not considered a "value", but rather a marker indicating the undefined value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown. The logical result of the expression below, which compares the value 10 to Null, is Unknown:

SELECT 10 = NULL -- Results in Unknown

However, certain operations on Null can return values if the absent value is not relevant to the outcome of the operation. Consider the following example:

SELECT NULL OR TRUE -- Results in True

In this case, the fact that the value on the left of OR is unknowable is irrelevant, because the outcome of the OR operation would be True regardless of the value on the left.
SQL implements three logical results, so SQL implementations must provide for a specialized three-valued logic. The rules governing SQL three-valued logic are shown in the tables below " The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene and Łukasiewicz three-valued logic.

SQL three-valued logic is encountered in Data Manipulation Language in comparison predicates of DML statements and queries. The WHERE clause causes the DML statement to act on only those rows for which the predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by INSERT, UPDATE, or DELETE DML statements, and are discarded by SELECT queries. Interpreting Unknown and False as the same logical result is a common error encountered while dealing with Nulls. The following simple example demonstrates this fallacy:

SELECT *
FROM t
WHERE i = NULL;

The example query above logically always returns zero rows because the comparison of the i column with Null always returns Unknown, even for those rows where i is Null. The Unknown result causes the SELECT statement to summarily discard every row.

p IS NOT TRUE

p IS FALSE

p IS NOT FALSE

p IS UNKNOWN

p IS NOT UNKNOWN

Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates test whether data is, or is not, Null.
The SQL standard contains the optional feature F571 "Truth value tests" that introduces three additional logical unary operators, also using postfix notation. They have the following truth tables:

The F571 feature is orthogonal to the presence of the [|Boolean datatype] in SQL and, despite syntactic similarities, F571 does not introduce Boolean or three-valued literals in the language. The F571 feature was actually present in SQL92, well before the Boolean datatype was introduced to the standard in 1999. The F571 feature is implemented by few systems, however; PostgreSQL is one of those implementing it.
The addition of IS UNKNOWN to the other operators of SQL's three-valued logic makes the SQL three-valued logic functionally complete, meaning its logical operators can express any conceivable three-valued logical function.
On systems that do not support the F571 feature, it is possible to emulate IS UNKNOWN p by going over every argument that could make the expression p Unknown and test those arguments with IS NULL or other NULL-specific functions, although this may be more cumbersome.