List of biological databases
are stores of biological information. The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases. can be used to browse and search several biological databases. Furthermore, the developed by the National Institute of Allergy and Infectious Diseases enables searching across databases.
Meta databases
Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism. Originally, metadata was only a common term referring simply to data about data such as tags, keywords, and markup headers.- : a community-driven registry of bioinformatics software and data resources
- ConsensusPathDB: a molecular functional interaction database, integrating information from 12 others
- Entrez search system at the National Center for Biotechnology Information
- Expert Protein Analysis System
- Neuroscience Information Framework : integrates hundreds of neuroscience relevant resources; many are listed below
- Resources at the of the University of Pittsburgh
- *
- **
- *
- *
Model organism databases
- PomBase: the knowledgebase for the fission yeast Schizosaccharomyces pombe
- SubtiWiki: integrated database for the model bacterium Bacillus subtilis
- TAIR: the knowledgebase for the plant ''Arabidopsis thaliana''
Nucleic acid databases
DNA databases">DNA database">DNA databases
The primary databases make up the International Nucleotide Sequence Database. The include:DDBJ, GenBank and European Nucleotide Archive are repositories for nucleotide sequence data from all organisms. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive, which archives raw reads from high-throughput sequencing instruments.
Secondary databases are:
- HapMap
- OMIM : inherited diseases
- RefSeq
- 1000 Genomes Project: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
- a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation.
- Nucleosome positioning region database
Gene expression databases
Genome databases
These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.Phenotype databases
- PHI-base: pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer-reviewed literature.
- RGD Rat Genome Database: genomic and phenotype data for Rattus norvegicus
- PomBase database: manually curated phenotypic data for the yeast ''Schizosaccharomyces pombe''
[RNA] databases
- miRBase: the microRNA database
- PolymiRTS: a database of DNA variations in putative microRNA target sites
- PolyQ: database of polyglutamine repeats in disease and non-disease associated proteins
- Rfam: a database of RNA families
- IRESbase: A comprehensive database of experimentally validated internal ribosome entry sites.
Amino acid and protein databases
Several publicly available data repositories and resources have been developed to support and manage protein related information, biological knowledge discovery and data-driven hypothesis generation. The databases in the table below are selected from the databases listed in the Nucleic Acids Research databases issues and database collection and the databases cross-referenced in the UniProtKB. Most of these databases are cross-referenced with UniProt / UniProtKB so that identifiers can be mapped to each other.
Proteins in human:'''
There are about ~20,000 protein coding genes in the standard human genome. if we are Including splice variants, there could be as many as 500,000 unique human proteins
Different types of Protein databases
| DB name | DB website | Provider | Data sources | Revenue/Sponsors sources | Integrates | Desc. | Size | DB type | Actively maintained |
| InterPro | ELIXIR infrastructure | European Bioinformatics Institute | EMBL, The Welcome trust, BBSRC | CATH-Gene3D, CDD, HAMAP, MobiDB, PANTHER, Pfam, SMART, SUPERFAMILY, SFLD, TIGRFAMs, | classifies proteins into families and predicts the presence of domains and sites | Protein sequence databases | Yes | ||
| NeXtProt | CALIPHO | Swiss Institute of Bioinformatics | UniProt, Cellosaurus, Gnomad, IntAct, SRAA Atlas, Uniprot - GOA, BGEE, COSMIC, MassIVE, Peptide atlas | a human protein-centric knowledge resource | Protein sequence databases | Yes | |||
| Wiki-Pi | Madhavi K. Ganapathiraju | At present Wiki-Pi contains 48,419 unique interactions among 10,492 proteins. However it is not clear if this is unique proteins | Protein interaction Database | ?? | |||||
| Human Protein Reference Database | Institute of Bioinformatics, Bangalore, India | One source claims 15000 proteins. But it is unclear how many of these are unique | |||||||
| Pfam | Sanger Institute | protein families database of alignments and HMMs | Protein sequence databases | ||||||
| Human Proteinpedia | Institute of Bioinformatics, Bangalore and Johns Hopkins University, | The human Proteinpedia is based on HPRD which is a repository hosting over 30,000 human proteins. However it is unclear how many of these are unique proteins | |||||||
| Human Protein Atlas | The Swedish Government | It contains roughly 10 million IHC images of a bit less than 25,000 antibodies. But once again it is unclear how many of these are unique | |||||||
| PRINTS | Manchester University | a compendium of protein fingerprints | Protein sequence databases | ||||||
| PROSITE | database of protein domains, families and functional sites | Protein sequence databases | |||||||
| Protein Information Resource | Georgetown University Medical Center | Protein sequence databases | |||||||
| SUPERFAMILY | library of HMMs representing superfamilies and database of annotations for all completely sequenced organisms | Protein sequence databases | |||||||
| Swiss-Prot | Swiss Institute of Bioinformatics | protein knowledgebase | Protein sequence databases | ||||||
| Protein Data Bank | Protein DataBank in Europe ', ProteinDatabank in Japan ', and RCSB | Protein structure databases | |||||||
| Structural Classification of Proteins | Protein structure databases | ||||||||
| CATH database | Protein structure databases | ||||||||
| ModBase | Sali Lab, UCSF | database of comparative protein structure models | Protein model databases | ||||||
| SIMAP | database of protein similarities computed using FASTA | Protein model databases | |||||||
| Swiss-model | server and repository for protein structure models | Protein model databases | |||||||
| AAindex | database of amino acid indices, amino acid mutation matrices, and pair-wise contact potentials | Protein model databases | |||||||
| BioGRID | Samuel Lunenfeld Research Institute | general repository for interaction datasets | Protein-protein and other molecular interactions | ||||||
| RNA-binding protein database | Protein-protein and other molecular interactions | ||||||||
| Database of Interacting Proteins | Univ. of California | Protein-protein and other molecular interactions | |||||||
| IntAct | EMBL-EBI | open-source database for molecular interactions | Protein-protein and other molecular interactions | ||||||
| String | an open source molecular interaction database to study interactions between proteins | Protein-protein and other molecular interactions | |||||||
| Human Protein Atlas | Human Protein Atlas | aims at mapping all the human proteins in cells, tissues and organs | Protein expression databases | ||||||
| ProteinModelPortal | ?? | ?? | 3D structure protein databases | ||||||
| SWISS-MODEL Repository | University of Basel | The Swiss government | 3D structure protein databases | ||||||
| DisProt | ELIXIR infrastructure | Indiana University School of Medicine, Temple University, University of Padua | funding from the European Union's Horizon 2020 | Swiss Prot/Uni Prot, CATH, Pfam, Europe PMC, BITEM, ECO, Geneontology | database of experimental evidences of disorder in proteins | 3D structure protein databases, Protein sequence databases | |||
| MobiDB | John Moult, Christine Orengo, Predrag Radivojac | University of Padua | Italian Government | database of intrinsic protein disorder annotation | 3D structure protein databases, Protein sequence databases | ||||
| ModBase | Ursula Pieper, Ben Webb, Narayanan Eswar, Andrej Sali Roberto Sanchez | UCSF, Sali Lab | 3D structure protein databases | ||||||
| PDBsum | European Bioinformatics Institute 2013 | Wellcome Trust | 3D structure protein databases | ||||||
| CCDS | NCBI | ?? | Sequence databases | ||||||
| UniProtKB | ?? | ?? | Sequence databases | ||||||
| Swiss Prot/Uni Prot | and | SIB Swiss Institute of Bioinformatics | European Bioinformatics Institute | Swiss-Prot has collected over 81 000 variants in roughly 13,000 human protein sequence records from peer-reviewed literature. It is unclear how many unique proteins types are present in the database. |