General feature format
In bioinformatics, the general feature format is a file format used for describing genes and other features of DNA, RNA and protein sequences.
GFF Versions
The following versions of GFF exist:- , generally deprecated
- *, a derivative used by Ensembl
- *, with additional pragmas and attributes for sequence_alteration features
The GTF is identical to GFF, version 2.
GFF general structure
All GFF formats are tab delimited with 9 fields per line. They all share the same structure for the first 7 fields, while differing in the content and format of the ninth field. Some field names have been changed in GFF3 to avoid confusion. For example, the "seqid" field was formerly referred to as "sequence", which may be confused with a nucleotide or amino acid chain. The general structure is as follows:| Position index | Position name | Description |
| 1 | seqid | The name of the sequence where the feature is located. |
| 2 | source | The algorithm or procedure that generated the feature. This is typically the name of a software or database. |
| 3 | type | The feature type name, like "gene" or "exon". In a well structured GFF file, all the children features always follow their parents in a single block. In GFF3, all features and their relationships should be compatible with the . |
| 4 | start | Genomic start of the feature, with a 1-base offset. This is in contrast with other 0-offset half-open sequence formats, like BED. |
| 5 | end | Genomic end of the feature, with a 1-base offset. This is the same end coordinate as it is in 0-offset half-open sequence formats, like BED. |
| 6 | score | Numeric value that generally indicates the confidence of the source in the annotated feature. A value of "." is used to define a null value. |
| 7 | strand | Single character that indicates the strand of the feature. This can be "+", "-",, ".", or "?" for features with relevant but unknown strands. |
| 8 | phase | phase of CDS features; it can be either one of 0, 1, 2 or ".". See the section below for a detailed explanation. |
| 9 | attributes | A list of tag-value pairs separated by a semicolon with additional information about the feature. |
The 8th field: phase of CDS features
Simply put, CDS means "Coding DNA Sequence". The exact meaning of the term is defined by Sequence Ontology. According to the GFF3 specification:Meta Directives
In GFF files, additional meta information can be included and follows after the ## directive. This meta information can detail GFF version, sequence region, or species.GFF software
Servers
Servers that generate this format:Clients
Clients that use this format:| Name | Description | Links |
| GBrowse | GMOD genome viewer | |
| IGB | Integrated Genome Browser | Integrated Genome Browser |
| Jalview | A multiple sequence alignment editor & viewer | Jalview |
| STRAP | Underlining sequence features in multiple alignments. Example output: | |
| JBrowse | JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5 | |
| ZENBU | A collaborative, omics data integration and interactive visualization system |
Validation
The modENCODE project hosts an with generous limits of 286.10 MB and 15 million lines.The Genome Tools software collection contains a gff3validator tool that can be used offline to validate and possibly tidy GFF3 files. An is also available.