First normal form
First normal form is the most basic level of database normalization defined by English computer scientist Edgar F. Codd, the inventor of the relational database. A relation can be said to be in first normal form if each field is atomic, containing a single value rather than a set of values or a nested table. In other words, a relation complies with first normal form if no attribute domain has relations as elements.
Most relational database management systems, including standard SQL, do not support creating or using table-valued columns, which means most relational databases will be in first normal form by necessity. Otherwise, normalization to 1NF involves eliminating nested relations by breaking them up into separate relations associated with each other using foreign keys. This process is a necessary step when moving data from a non-relational database, such as one using a hierarchical or document-oriented model, to a relational database.
A database must satisfy 1NF to satisfy further "normal forms", such as 2NF and 3NF, which enable the reduction of redundancy and anomalies. Other benefits of adopting 1NF include the introduction of increased data independence and flexibility and simplification of the relational algebra and query language necessary to describe operations on the database.
Codd considered 1NF mandatory for relational databases, while the other normal forms were merely guidelines for database design.
Background
First normal form was introduced in 1970 by Edgar F. Codd in his paper "A relational model of data for large shared data banks", although initially it was simply referred to as "normalization" or "normal form". It was renamed to "first normal form" when Codd introduced additional normal forms in his paper "Further Normalization of the Data Base Relational Model" in 1971.The relational model was proposed as an improvement over hierarchical databases which were prevalent at the time. A key difference lies in how relationships between records are represented. In a hierarchical database, one-to-many relationships are represented through containment: a single record may contain sets of records as attribute values. But Codd argued that hierarchy is not flexible and expressive enough for more complex data models. For example, many-to-many relationships cannot be represented through hierarchy. Thus he suggest eliminating nested records and instead represent relationship through foreign keys. This allows richer relationships to be expressed, since a record can now participate in multiple relationships.
A direct translation of a hierarchical database into relations would represent repeating groups as nested relations. Thus normalization is defined as eliminating nested relations and instead represent the one-to-many relationship through foreign keys.
Codd distinguishes between "atomic" and "compound" data. Atomic data includes basic types such as numbers and strings – broadly speaking, it "cannot be decomposed into smaller pieces by the DBMS ". Compound data is made up of structures such as relations which contain several pieces of atomic data and thus "can be decomposed by the DBMS".
In a relation, each attribute has a set of allowed values known as its domain. Each tuple in the relation contains one value per attribute, and each must be an element in that attribute's domain. Codd distinguishes attributes which have "simple domains" containing only atomic data from attributes with "nonsimple domains" containing at least some forms of compound data. Nonsimple domains introduce a degree of structural complexity which can be difficult to navigate, to query and to update – for instance, it will be time-consuming to operate across several nested relations, which can be found in some non-relational databases.
First normal form therefore requires all attribute domains to be simple domains, such that the data in each field is atomic and no relation has relation-valued attributes. Precisely, Codd states that, in the relational model, "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." Normalization to 1NF is thus a process of eliminating nonsimple domains from all relations.
Examples
Design that violates 1NF
This table of customers' credit card transactions does not conform to first normal form, as each customer corresponds to a repeating group of transactions. Such a design can be represented in a hierarchical database, but not in an SQL database, since SQL does not support nested tables.| CustomerID | Name | Transactions | - | - | - | |||||||||||||||||||||||||||||||||
| 1 | Abraham | The evaluation of any query relating to customers' transactions would broadly involve two stages:
Design that complies with 1NFCodd described how a database like this could be made less structurally complex and more flexible by transforming it into a relational database in first normal form. To normalize the table so it complies with first normal form, attributes with nonsimple domains must be extracted to separate, stand-alone relations. Each extracted relation gains a foreign key referencing the primary key of the relation which initially contained it. This process can be applied recursively to nonsimple domains nested in multiple levels.In this example, CustomerID is the primary key of the containing relation and will therefore be appended as a foreign key to the new relation:
In this modified design, the primary key is in the first relation and in the second relation. Now that a single, "top-level" relation contains all transactions, it will be simpler to run queries on the database. To find the monetary sum of all October transactions, the DMBS would simply find all rows with a Date falling in October and sum the Amount fields. All values are now easily exposed to the DBMS, whereas previously some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself well to general-purpose query processing, whereas the unnormalized design does not. It is worth noting that the revised design also meets the additional requirements for second and third normal form. RationaleNormalization to 1NF is the major theoretical component of transferring a database to the relational model. Use of a relational database in 1NF brings certain advantages:
SUBSTRING to decompose values otherwise considered atomic.Hugh Darwen and Christopher J. Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood. In particular, the notion of an atomic value as a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
|