Entity–relationship model


An entity–relationship model describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types and specifies relationships that can exist between entities.
In software engineering, an ER model is commonly formed to represent things a business needs to remember in order to perform business processes. Consequently, the ER model becomes an abstract data model, that defines a data or information structure that can be implemented in a database, typically a relational database.
Entity–relationship modeling was developed for database and design by Peter Chen and published in a 1976 paper, with variants of the idea existing previously. Today it is commonly used for teaching students the basics of database structure. Some ER models show super and subtype entities connected by generalization-specialization relationships, and an ER model can also be used to specify domain-specific ontologies.

Introduction

An ER model usually results from systematic analysis to define and describe the data created and needed by processes in a business area. Typically, it represents records of entities and events monitored and directed by business processes, rather than the processes themselves. It is usually drawn in a graphical form as boxes that are connected by lines which express the associations and dependencies between entities. It can also be expressed in a verbal form, for example: one building may be divided into zero or more apartments, but one apartment can only be located in one building.
Entities may be defined not only by relationships, but also by additional properties, which include identifiers called "primary keys". Diagrams created to represent attributes as well as entities and relationships may be called entity-attribute-relationship diagrams, rather than entity–relationship models.
An ER model is typically implemented as a database. In a simple relational database implementation, each row of a table represents one instance of an entity type, and each field in a table represents an attribute type. In a relational database a relationship between entities is implemented by storing the primary key of one entity as a pointer or "foreign key" in the table of another entity.
There is a tradition for ER/data models to be built at two or three levels of abstraction. The conceptual-logical-physical hierarchy below is used in other kinds of specification, and is different from the three schema approach to software engineering.
;Conceptual data model
;Logical data model
;Physical data model
The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage, mapped to a logical data model, such as the relational model. This in turn is mapped to a physical model during physical design. Sometimes, both of these phases are referred to as "physical design."

Components

An entity may be defined as a thing that is capable of an independent existence that can be uniquely identified, and is capable of storing data. An entity is an abstraction from the complexities of a domain. When we speak of an entity, we normally speak of some aspect of the real world that can be distinguished from other aspects of the real world.
An entity is a thing that exists either physically or logically. An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen, entities and entity-types should be distinguished. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym.
Entities can be thought of as nouns. Examples include a computer, an employee, a song, or a mathematical theorem.
A relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples include an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, and a proves relationship between a mathematician and a conjecture.
The model's linguistic aspect described above is used in the declarative database query language ERROL, which mimics natural language constructs. ERROL's semantics and implementation are based on reshaped relational algebra, a relational algebra that is adapted to the entity–relationship model and captures its linguistic aspect.
Entities and relationships can both have attributes. For example, an employee entity might have a Social Security Number attribute, while a proved relationship may have a date attribute.
All entities except weak entities must have a minimal set of uniquely identifying attributes that may be used as a unique/primary key.
Entity-relationship diagrams do not show single entities or single instances of relations. Rather, they show entity sets and relationship sets. For example, a particular song is an entity, the collection of all songs in a database is an entity set, the eaten relationship between a child and his lunch is a single relationship, and the set of all such child-lunch relationships in a database is a relationship set.
In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation.
Certain cardinality constraints on relationship sets may be indicated as well.
English grammar structureER structure
Common nounEntity type
Proper nounEntity
Transitive verbRelationship type
Intransitive verbAttribute type
AdjectiveAttribute for entity
AdverbAttribute for relationship

Physical views show how data is actually stored.

Relationships, roles, and cardinalities

Chen's original paper gives an example of a relationship and its roles. He describes a relationship "marriage" and its two roles, "husband" and "wife".
A person plays the role of husband in a marriage and another person plays the role of wife in the marriage. These words are nouns.
Chen's terminology has also been applied to earlier ideas. The lines, arrows, and crow's feet of some diagrams owes more to the earlier Bachman diagrams than to Chen's relationship diagrams.
Another common extension to Chen's model is to "name" relationships and roles as verbs or phrases.

Role naming

It has also become prevalent to name roles with phrases such as is the owner of and is owned by. Correct nouns in this case are owner and possession. Thus, person plays the role of owner and car plays the role of possession rather than person plays the role of, is the owner of, etc.
Using nouns has direct benefit when generating physical implementations from semantic models. When a person has two relationships with car it is possible to generate names such as owner_person and driver_person, which are immediately meaningful.

Cardinalities

Modifications to the original specification can be beneficial. Chen described look-across cardinalities. As an aside, the Barker–Ellis notation, used in Oracle Designer, uses same-side for minimum cardinality and role, but look-across for maximum cardinality.
Research by Merise, Elmasri & Navathe and others has shown there is a preference for same-side for roles and both minimum and maximum cardinalities, and researchers have shown that this is more coherent when applied to n-ary relationships of order greater than 2.
Dullea et al. states: "A 'look across' notation such as used in the UML does not effectively represent the semantics of participation constraints imposed on relationships where the degree is higher than binary."
Feinerer says: "Problems arise if we operate under the look-across semantics as used for UML associations. Hartmann investigates this situation and shows how and why different transformations fail." and also "As we will see on the next few pages, the look-across interpretation introduces several difficulties that prevent the extension of simple mechanisms from binary to n-ary associations."
Image:ERD-artist-performs-song.svg|thumb|320px|Two related entities shown using Crow's Foot notation. In this example, an optional relationship is shown between Artist and Song; the symbol composed of branching lines, closest to the song entity represents "zero, one, or many", whereas a song has "one and only one" Artist, emphasized by the symbol composed of parallel lines. The former is therefore read as, an Artist perform "zero, one, or many" song. Chen's notation for entity–relationship modeling uses rectangles to represent entity sets, and diamonds to represent relationships appropriate for first-class objects: they can have attributes and relationships of their own. If an entity set participates in a relationship set, they are connected with a line.
Attributes are drawn as ovals and connected with a line to exactly one entity or relationship set.
Cardinality constraints are expressed as follows:
  • a double line indicates a participation constraint, totality, or surjectivity: all entities in the entity set must participate in at least one relationship in the relationship set;
  • an arrow from an entity set to a relationship set indicates a key constraint, i.e. injectivity: each entity of the entity set can participate in at most one relationship in the relationship set;
  • a thick line indicates both, i.e. bijectivity: each entity in the entity set is involved in exactly one relationship.
  • an underlined name of an attribute indicates that it is a key: two different entities or relationships with this attribute always have different values for this attribute.
Attributes are often omitted as they can clutter up a diagram. Other diagram techniques often list entity attributes within the rectangles drawn for entity sets.