GEDCOM


FamilySearch GEDCOM, or simply GEDCOM, is an open file format and the de facto standard specification for storing genealogical data. It was developed by the Church of Jesus Christ of Latter-day Saints, the operators of FamilySearch, to aid in the research and sharing of genealogical information. A common usage is as a standard format for the backup and transfer of family tree data between different genealogy software and websites, most of which support importing from and exporting to GEDCOM format.
GEDCOM is defined as a plain text file, using UTF-8 encoding as of version 7.0. This file contains genealogical information about individuals such as names, events, and relationships; metadata links these records together.
GEDCOM 7.0, released in 2021, is the most recent version of the GEDCOM specification as of 2024. However, its predecessor, GEDCOM 5.5.1, remains the industry's format standard for the exchange of genealogical data. First released as a draft standard in 1999, GEDCOM 5.5.1 received only minor updates in the subsequent 20 years leading up to the release of 5.5.1 final in 2019. To address its shortcomings, some genealogy programs introduced proprietary extensions to GEDCOM which are not always recognized by other programs, such as GEDCOM 5.5 EL. Efforts have been made to have 7.0 more widely adopted since its release. FamilySearch intends to be GEDCOM 7.0 compatible in the third quarter 2022 and Ancestry.com is planning for 7.0 compatibility, but has not yet specified an implementation date.

Data model

GEDCOM uses a lineage-linked data model based on the conceptual model of the nuclear family. The family record type is therefore the only source of links between the individuals in the file, assigning parents and children by referring to individuals' unique ID numbers. These historical origins are described in the 7.0 specification document: "The FAM record was originally structured to represent families where a male HUSB and female WIFE produce CHIL."
Although the links in a GEDCOM family record still use the original naming indicating a husband and a wife, the specification now states that "sex, gender, titles, and roles of partners should not be inferred based on the partner that the HUSB or WIFE structure points to" and that these individuals within a family structure are collectively referred to as 'partners', 'parents' or 'spouses'. A FAM record can also be used for "cohabitation, fostering, adoption, and so on, regardless of the gender of the partners."

File structure

A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people, families, sources of information, and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records begin with a line with level 0, while other level numbers are positive integers.
Although it is possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation "The Windows GEDCOM Validator" can be used. or the older unmaintained Gedcheck from the LDS Church.
During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program. Findings showed that a number of problems existed and that "The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear." In 2005, the Genealogical Software Report Card was evaluated and included testing the GEDCOM 5.5 standard using the Gedcheck program.
To assist with adoption of GEDCOM 7.0, validation tools now exist for that standard as well.

Example

The following is a sample GEDCOM file.
sample.ged

0 HEAD
1 SOUR PAF
2 NAME Personal Ancestral File
2 VERS 5.0
1 DATE 30 NOV 2000
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED
1 CHAR ANSEL
1 SUBM @U1@
0 @I1@ INDI
1 NAME John /Smith/
1 SEX M
1 FAMS @F1@
0 @I2@ INDI
1 NAME Elizabeth /Stansfield/
1 SEX F
1 FAMS @F1@
0 @I3@ INDI
1 NAME James /Smith/
1 SEX M
1 FAMC @F1@
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
1 CHIL @I3@
0 @U1@ SUBM
1 NAME Submitter
0 TRLR

The header includes the source program and version, the GEDCOM version, the character encoding, and a link to information about the submitter of the file.
The individual records define John Smith, Elizabeth Stansfield, and James Smith.
The family record links the husband, wife, and child by their ID numbers.

Versions

The current version of the specification in wide use is GEDCOM 5.5.1 final, which was released on 15 November 2019. Its predecessor, GEDCOM 5.5.1 draft was issued in 1999, introducing nine new attribute, tags and adding UTF-8 as an approved character encoding. The draft was not formally approved, but its provisions were adopted in some part by a number of genealogy programs including FamilySearch.org.
Lineage-linked GEDCOM is the deliberate de facto common denominator. Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have never fully supported the feature of multilingual Unicode text introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese and Korean characters, without which they could be ambiguous and of little use for genealogical or historical research. PAF 5.2 is an example of software that uses UTF-8 as its internal character set, and can output a UTF-8 GEDCOM.
GEDCOM 7.0 requires UTF-8 encoding throughout, and resolves other long-standing issues with GEDCOM 5.5.1. Multimedia support in the form of an associated.zip file, called a GEDZip, is another inclusion. Efforts are underway to see 7.0 embraced as the new exchange standard. GEDCOM 7.0 allows explicitly identifying what standards other than GEDCOM may apply to a particular file. GEDCOM has always been extensible, but prior to 7.0 there was no standard way to identify such extensions. Also, GEDCOM 7.0 allows explicitly marking an event as nonexistent. This allows, for example, documenting that a particular individual never married. GEDCOM 7.0 was the first version to use semantic versioning, and is the most recent minor version of the specification.
, the next planned minor release is v7.1, which is under development.

Release history

Limitations

Support for multi-person events and sources

A GEDCOM file can contain information on events such as births, deaths, census records, ship's records, marriages, etc.; a rule of thumb is that an event is something that took place at a specific time, at a specific place. GEDCOM files can also contain attributes such as physical description, occupation, and total number of children; unlike events, attributes generally cannot be associated with a specific time or place.
The GEDCOM specification requires that each event or attribute is associated with exactly one individual or family. This causes redundancy for events such as census records where the actual census entry often contains information on multiple individuals. In the GEDCOM file, for census records a separate census "CENS" event must be added for each individual referenced. Some genealogy programs, such as Gramps and The Master Genealogist, have elaborate database structures for sources that are used, among other things, to represent multi-person events. When databases are exported from one of these programs to GEDCOM, these database structures cannot be represented in GEDCOM due to this limitation, with the result that the event or source information including all of the relevant citation reference information must be duplicated each place that it is used. This duplication makes it difficult for the user to maintain the information related to sources.
In the GEDCOM specification, events that are associated with a family such as marriage information is only stored in a GEDCOM once, as part of the family record, and then both spouses are linked to that single family record.

Ambiguity in the specification

The GEDCOM specification was made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.

Ordering of events that do not have dates

The GEDCOM specification does not offer explicit support for keeping a known order of events. In particular, the order of relationships for a person and the order of the children within a relationship can be lost. In many cases the sequence of events can be derived from the associated dates. But dates are not always known, in particular when dealing with data from centuries ago. For example, in the case that a person has had two relationships, both with unknown dates, but from descriptions it is known that the second one is indeed the second one. The order in which these FAMS are recorded in GEDCOM's INDI record will depend on the exporting program. In Aldfaer for instance, the sequence depends on the ordering of the data by the user. The proposed XML GEDCOM standard does not address this issue either.

Lesser-known features

GEDCOM has many features that are not commonly used. Some software packages do not support all the features that the GEDCOM standard allows.