Mega2, the Manipulation Environment for Genetic Analysis


Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
The software allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages. In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.
Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as and . The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to format. For more information about the Mega2R package, see .
Mega2 has been used to facilitate genetic analyses of a wide variety of human traits, including hereditary dystonia, Ehlers-Danlos syndrome, multiple sclerosis, and gliomas. A list of PubMed Central articles citing Mega2 can be seen .
Mega2, which focusses on data reformatting, should not be confused with the MEGA, Molecular Evolutionary Genetics Analysis program, which focuses on molecular evolution and phylogenetics.

Input file formats

Mega2 accepts input data in a variety of widely used file formats. These contain, at a minimum, data about the phenotypes, the marker genotypes, any family structures, and map positions of the markers.
Input formatDescriptionLinks
LINKAGEpre-Makeped or post-Makeped formats,
Mega2simplified/augmented LINKAGE-format
PLINKped format or binary bed format
VCF or BCFVariant Call Format or Binary Variant Call FormatVariant Call Format (Wikipedia entry),
IMPUTE2IMPUTE2 GEN and BGEN Formats,,

Output file formats

Mega2 supports conversion to the following output formats.
Output formatLinks
ASPEX format
Allegro format
Beagle format
CRANEFOOT format
Eigenstrat format
FBAT format
GeneHunter format
GeneHunter-Plus format
IQLS/Idcoefs format,
Linkage format,
Loki format
MaCH/minimac3 format,
MLBQTL format
Mega2 annotated format
Mendel format
Merlin format
Merlin/SimWalk2-NPL format
PANGAEA MORGAN format
PAP format
PLINK format
PREST format
PSEQ format
Pre-makeped LINKAGE format,
ROADTRIPS format
SAGE format,
SHAPEIT format
SIMULATE format
SLINK format
SOLAR format
SPLINK format
SUP format
SimWalk2 format
Structure format
VCF formatVariant Call Format (Wikipedia entry)
Vintage Mendel format
Vitesse format

Documentation

The Mega2 documentation is available in HTML format, and in PDF format.