Join (SQL)


A join clause in the Structured Query Language combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields. There are several variants of JOIN: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, CROSS, and others.

Example tables

To explain join types, the rest of this article uses the following tables:
LastNameDepartmentID
Rafferty31
Jones33
Heisenberg33
Robinson34
Smith34
Williams

DepartmentIDDepartmentName
31Sales
33Engineering
34Clerical
35Marketing

Department.DepartmentID is the primary key of the Department table, whereas Employee.DepartmentID is a foreign key.
Note that in Employee, "Williams" has not yet been assigned to a department. Also, no employees have been assigned to the "Marketing" department.
These are the SQL statements to create the above tables:

CREATE TABLE department;
CREATE TABLE employee,
DepartmentID INT REFERENCES department
);
INSERT INTO department
VALUES,
,
,
;
INSERT INTO employee
VALUES,
,
,
,
,
;

Cross join

CROSS JOIN returns the Cartesian product of rows from tables in the join. In other words, it will produce rows which combine each row from the first table with each row from the second table.
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
Rafferty31Sales31
Jones33Sales31
Heisenberg33Sales31
Smith34Sales31
Robinson34Sales31
WilliamsSales31
Rafferty31Engineering33
Jones33Engineering33
Heisenberg33Engineering33
Smith34Engineering33
Robinson34Engineering33
WilliamsEngineering33
Rafferty31Clerical34
Jones33Clerical34
Heisenberg33Clerical34
Smith34Clerical34
Robinson34Clerical34
WilliamsClerical34
Rafferty31Marketing35
Jones33Marketing35
Heisenberg33Marketing35
Smith34Marketing35
Robinson34Marketing35
WilliamsMarketing35

Example of an explicit cross join:

SELECT *
FROM employee CROSS JOIN department;

Example of an implicit cross join:

SELECT *
FROM employee, department;
The cross join can be replaced with an inner join with an always-true condition:
SELECT *
FROM employee INNER JOIN department ON 1=1;

CROSS JOIN does not itself apply any predicate to filter rows from the joined table. The results of a CROSS JOIN can be filtered using a WHERE clause, which may then produce the equivalent of an inner join.
In the SQL:2011 standard, cross joins are part of the optional F401, "Extended joined table", package.
Normal uses are for checking the server's performance.

Inner join

An inner join requires each row in the two joined tables to have matching column values, and is a commonly used join operation in applications but should not be assumed to be the best choice in all situations. Inner join creates a new result table by combining column values of two tables based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows that satisfy the join-predicate. When the join-predicate is satisfied by matching non-NULL values, column values for each matched pair of rows of A and B are combined into a result row.
The result of the join can be defined as the outcome of first taking the cartesian product of all rows in the tables and then returning all rows that satisfy the join predicate. Actual SQL implementations normally use other approaches, such as hash joins or sort-merge joins, since computing the Cartesian product is slower and would often require a prohibitively large amount of memory to store.
SQL specifies two different syntactical ways to express joins: the "explicit join notation" and the "implicit join notation". The "implicit join notation" is no longer considered a best practice, although database systems still support it.
The "explicit join notation" uses the JOIN keyword, optionally preceded by the INNER keyword, to specify the table to join, and the ON keyword to specify the predicates for the join, as in the following example:

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName
FROM employee
INNER JOIN department ON
employee.DepartmentID = department.DepartmentID;

Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentName
Robinson34Clerical
Jones33Engineering
Smith34Clerical
Heisenberg33Engineering
Rafferty31Sales

The "implicit join notation" simply lists the tables for joining, in the FROM clause of the SELECT statement, using commas to separate them. Thus it specifies a cross join, and the WHERE clause may apply additional filter-predicates.
The following example is equivalent to the previous one, but this time using implicit join notation:

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;

The queries given in the examples above will join the Employee and department tables using the DepartmentID column of both tables. Where the DepartmentID of these tables match, the query will combine the LastName, DepartmentID and DepartmentName columns from the two tables into a result row. Where the DepartmentID does not match, no result row is generated.
Thus the result of the execution of the query above will be:
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentName
Robinson34Clerical
Jones33Engineering
Smith34Clerical
Heisenberg33Engineering
Rafferty31Sales

The employee "Williams" and the department "Marketing" do not appear in the query execution results. Neither of these has any matching rows in the other respective table: "Williams" has no associated department, and no employee has the department ID 35. Depending on the desired results, this behavior may be a subtle bug, which can be avoided by replacing the inner join with an outer join.

Inner join and NULL values

Programmers should take special care when joining tables on columns that can contain NULL values, since NULL will never match any other value, unless the join condition explicitly uses a combination predicate that first checks that the joins columns are NOT NULL before applying the remaining predicate condition. The Inner Join can only be safely used in a database that enforces referential integrity or where the join columns are guaranteed not to be NULL. Many transaction processing relational databases rely on atomicity, consistency, isolation, durability data update standards to ensure data integrity, making inner joins an appropriate choice. However, transaction databases usually also have desirable join columns that are allowed to be NULL. Many reporting relational database and data warehouses use high volume extract, transform, load batch updates which make referential integrity difficult or impossible to enforce, resulting in potentially NULL join columns that an SQL query author cannot modify and which cause inner joins to omit data with no indication of an error. The choice to use an inner join depends on the database design and data characteristics. A left outer join can usually be substituted for an inner join when the join columns in one table may contain NULL values.
Any data column that may be NULL should never be used as a link in an inner join, unless the intended result is to eliminate the rows with the NULL value. If NULL join columns are to be deliberately removed from the result set, an inner join can be faster than an outer join because the table join and filtering is done in a single step. Conversely, an inner join can result in disastrously slow performance or even a server crash when used in a large volume query in combination with database functions in an SQL Where clause.
, A function in an SQL Where clause can result in the database ignoring relatively compact table indexes. The database may read and inner join the selected columns from both tables before reducing the number of rows using the filter that depends on a calculated value, resulting in a relatively enormous amount of inefficient processing.
When a result set is produced by joining several tables, including master tables used to look up full-text descriptions of numeric identifier codes, a NULL value in any one of the foreign keys can result in the entire row being eliminated from the result set, with no indication of error. A complex SQL query that includes one or more inner joins and several outer joins has the same risk for NULL values in the inner join link columns.
A commitment to SQL code containing inner joins assumes NULL join columns will not be introduced by future changes, including vendor updates, design changes and bulk processing outside of the application's data validation rules such as data conversions, migrations, bulk imports and merges.
One can further classify inner joins as equi-joins and theta joins.