Join (SQL)
A join clause in the Structured Query Language combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields. There are several variants of
JOIN: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, CROSS, and others.Example tables
To explain join types, the rest of this article uses the following tables:| LastName | DepartmentID |
| Rafferty | 31 |
| Jones | 33 |
| Heisenberg | 33 |
| Robinson | 34 |
| Smith | 34 |
| Williams |
| DepartmentID | DepartmentName |
| 31 | Sales |
| 33 | Engineering |
| 34 | Clerical |
| 35 | Marketing |
Department.DepartmentID is the primary key of the Department table, whereas Employee.DepartmentID is a foreign key.Note that in
Employee, "Williams" has not yet been assigned to a department. Also, no employees have been assigned to the "Marketing" department.These are the SQL statements to create the above tables:
CREATE TABLE department;
CREATE TABLE employee,
DepartmentID INT REFERENCES department
);
INSERT INTO department
VALUES,
,
,
;
INSERT INTO employee
VALUES,
,
,
,
,
;
Cross join
CROSS JOIN returns the Cartesian product of rows from tables in the join. In other words, it will produce rows which combine each row from the first table with each row from the second table.| Employee.LastName | Employee.DepartmentID | Department.DepartmentName | Department.DepartmentID |
| Rafferty | 31 | Sales | 31 |
| Jones | 33 | Sales | 31 |
| Heisenberg | 33 | Sales | 31 |
| Smith | 34 | Sales | 31 |
| Robinson | 34 | Sales | 31 |
| Williams | Sales | 31 | |
| Rafferty | 31 | Engineering | 33 |
| Jones | 33 | Engineering | 33 |
| Heisenberg | 33 | Engineering | 33 |
| Smith | 34 | Engineering | 33 |
| Robinson | 34 | Engineering | 33 |
| Williams | Engineering | 33 | |
| Rafferty | 31 | Clerical | 34 |
| Jones | 33 | Clerical | 34 |
| Heisenberg | 33 | Clerical | 34 |
| Smith | 34 | Clerical | 34 |
| Robinson | 34 | Clerical | 34 |
| Williams | Clerical | 34 | |
| Rafferty | 31 | Marketing | 35 |
| Jones | 33 | Marketing | 35 |
| Heisenberg | 33 | Marketing | 35 |
| Smith | 34 | Marketing | 35 |
| Robinson | 34 | Marketing | 35 |
| Williams | Marketing | 35 |
Example of an explicit cross join:
SELECT *
FROM employee CROSS JOIN department;
Example of an implicit cross join:
SELECT *
FROM employee, department;
SELECT *
FROM employee INNER JOIN department ON 1=1;
CROSS JOIN does not itself apply any predicate to filter rows from the joined table. The results of a CROSS JOIN can be filtered using a WHERE clause, which may then produce the equivalent of an inner join.In the SQL:2011 standard, cross joins are part of the optional F401, "Extended joined table", package.
Normal uses are for checking the server's performance.
Inner join
An inner join requires each row in the two joined tables to have matching column values, and is a commonly used join operation in applications but should not be assumed to be the best choice in all situations. Inner join creates a new result table by combining column values of two tables based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows that satisfy the join-predicate. When the join-predicate is satisfied by matching non-NULL values, column values for each matched pair of rows of A and B are combined into a result row.The result of the join can be defined as the outcome of first taking the cartesian product of all rows in the tables and then returning all rows that satisfy the join predicate. Actual SQL implementations normally use other approaches, such as hash joins or sort-merge joins, since computing the Cartesian product is slower and would often require a prohibitively large amount of memory to store.
SQL specifies two different syntactical ways to express joins: the "explicit join notation" and the "implicit join notation". The "implicit join notation" is no longer considered a best practice, although database systems still support it.
The "explicit join notation" uses the
JOIN keyword, optionally preceded by the INNER keyword, to specify the table to join, and the ON keyword to specify the predicates for the join, as in the following example:SELECT employee.LastName, employee.DepartmentID, department.DepartmentName
FROM employee
INNER JOIN department ON
employee.DepartmentID = department.DepartmentID;
| Employee.LastName | Employee.DepartmentID | Department.DepartmentName |
| Robinson | 34 | Clerical |
| Jones | 33 | Engineering |
| Smith | 34 | Clerical |
| Heisenberg | 33 | Engineering |
| Rafferty | 31 | Sales |
The "implicit join notation" simply lists the tables for joining, in the
FROM clause of the SELECT statement, using commas to separate them. Thus it specifies a cross join, and the WHERE clause may apply additional filter-predicates.The following example is equivalent to the previous one, but this time using implicit join notation:
SELECT employee.LastName, employee.DepartmentID, department.DepartmentName
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;
The queries given in the examples above will join the Employee and department tables using the DepartmentID column of both tables. Where the DepartmentID of these tables match, the query will combine the LastName, DepartmentID and DepartmentName columns from the two tables into a result row. Where the DepartmentID does not match, no result row is generated.
Thus the result of the execution of the query above will be:
| Employee.LastName | Employee.DepartmentID | Department.DepartmentName |
| Robinson | 34 | Clerical |
| Jones | 33 | Engineering |
| Smith | 34 | Clerical |
| Heisenberg | 33 | Engineering |
| Rafferty | 31 | Sales |
The employee "Williams" and the department "Marketing" do not appear in the query execution results. Neither of these has any matching rows in the other respective table: "Williams" has no associated department, and no employee has the department ID 35. Depending on the desired results, this behavior may be a subtle bug, which can be avoided by replacing the inner join with an outer join.
Inner join and NULL values
Programmers should take special care when joining tables on columns that can contain NULL values, since NULL will never match any other value, unless the join condition explicitly uses a combination predicate that first checks that the joins columns areNOT NULL before applying the remaining predicate condition. The Inner Join can only be safely used in a database that enforces referential integrity or where the join columns are guaranteed not to be NULL. Many transaction processing relational databases rely on atomicity, consistency, isolation, durability data update standards to ensure data integrity, making inner joins an appropriate choice. However, transaction databases usually also have desirable join columns that are allowed to be NULL. Many reporting relational database and data warehouses use high volume extract, transform, load batch updates which make referential integrity difficult or impossible to enforce, resulting in potentially NULL join columns that an SQL query author cannot modify and which cause inner joins to omit data with no indication of an error. The choice to use an inner join depends on the database design and data characteristics. A left outer join can usually be substituted for an inner join when the join columns in one table may contain NULL values.Any data column that may be NULL should never be used as a link in an inner join, unless the intended result is to eliminate the rows with the NULL value. If NULL join columns are to be deliberately removed from the result set, an inner join can be faster than an outer join because the table join and filtering is done in a single step. Conversely, an inner join can result in disastrously slow performance or even a server crash when used in a large volume query in combination with database functions in an SQL Where clause.
, A function in an SQL Where clause can result in the database ignoring relatively compact table indexes. The database may read and inner join the selected columns from both tables before reducing the number of rows using the filter that depends on a calculated value, resulting in a relatively enormous amount of inefficient processing.
When a result set is produced by joining several tables, including master tables used to look up full-text descriptions of numeric identifier codes, a NULL value in any one of the foreign keys can result in the entire row being eliminated from the result set, with no indication of error. A complex SQL query that includes one or more inner joins and several outer joins has the same risk for NULL values in the inner join link columns.
A commitment to SQL code containing inner joins assumes NULL join columns will not be introduced by future changes, including vendor updates, design changes and bulk processing outside of the application's data validation rules such as data conversions, migrations, bulk imports and merges.
One can further classify inner joins as equi-joins and theta joins.