Mega2, the Manipulation Environment for Genetic Analysis
Mega2 is a data manipulation software for applied statistical genetics. Mega is an acronym for Manipulation Environment for Genetic Analysis.
The software allows the applied statistical geneticist to convert one's data from several input formats to a large number output formats suitable for analysis by commonly used software packages. In a typical human genetics study, the analyst often needs to use a variety of different software programs to analyze the data, and these programs usually require that the data be formatted to their precise input specifications. Conversion of one's data into these multiple different formats can be tedious, time-consuming, and error-prone. Mega2, by providing validated conversion pipelines, can accelerate the analyses while reducing errors.
Mega2 produces a common intermediate data representation using SQLite3, which enables the data to be accessed by other programs and languages. In particular, the R package converts the SQLite3 data into R data frames. Several R functions are provided that illustrate how data can be extracted from the data frames for common R analysis, such as and . The key is being able to efficiently extract genotypes corresponding to chosen subsets of markers so as to facilitate gene-based association testing by automating looping over genes in the genome. Another function converts to VCF format and another converts the data to format. For more information about the Mega2R package, see .
Mega2 has been used to facilitate genetic analyses of a wide variety of human traits, including hereditary dystonia, Ehlers-Danlos syndrome, multiple sclerosis, and gliomas. A list of PubMed Central articles citing Mega2 can be seen .
Mega2, which focusses on data reformatting, should not be confused with the MEGA, Molecular Evolutionary Genetics Analysis program, which focuses on molecular evolution and phylogenetics.
Input file formats
Mega2 accepts input data in a variety of widely used file formats. These contain, at a minimum, data about the phenotypes, the marker genotypes, any family structures, and map positions of the markers.| Input format | Description | Links |
| LINKAGE | pre-Makeped or post-Makeped formats | , |
| Mega2 | simplified/augmented LINKAGE-format | |
| PLINK | ped format or binary bed format | |
| VCF or BCF | Variant Call Format or Binary Variant Call Format | Variant Call Format (Wikipedia entry), |
| IMPUTE2 | IMPUTE2 GEN and BGEN Formats | ,, |
Output file formats
Mega2 supports conversion to the following output formats.| Output format | Links |
| ASPEX format | |
| Allegro format | |
| Beagle format | |
| CRANEFOOT format | |
| Eigenstrat format | |
| FBAT format | |
| GeneHunter format | |
| GeneHunter-Plus format | |
| IQLS/Idcoefs format | , |
| Linkage format | , |
| Loki format | |
| MaCH/minimac3 format | , |
| MLBQTL format | |
| Mega2 annotated format | |
| Mendel format | |
| Merlin format | |
| Merlin/SimWalk2-NPL format | |
| PANGAEA MORGAN format | |
| PAP format | |
| PLINK format | |
| PREST format | |
| PSEQ format | |
| Pre-makeped LINKAGE format | , |
| ROADTRIPS format | |
| SAGE format | , |
| SHAPEIT format | |
| SIMULATE format | |
| SLINK format | |
| SOLAR format | |
| SPLINK format | |
| SUP format | |
| SimWalk2 format | |
| Structure format | |
| VCF format | Variant Call Format (Wikipedia entry) |
| Vintage Mendel format | |
| Vitesse format |