Computational biology


Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and data science, the field also has foundations in applied mathematics, molecular biology, cell biology, chemistry, and genetics.

History

, the analysis of informatics processes in biological systems, began in the early 1970s. At this time, research in artificial intelligence was using network models of the human brain in order to generate new algorithms. This use of biological data pushed biological researchers to use computers to evaluate and compare large data sets in their own field.
By 1982, researchers shared information via punch cards. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information.
Perhaps the best-known example of computational biology, the Human Genome Project, officially began in 1990. By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals. Work continued, however, and by 2021 level "a complete genome" was reached with only 0.3% remaining bases covered by potential issues. The missing Y chromosome was added in January 2022.
Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields. Today, the International Society for Computational Biology recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field. In addition to helping sequence the human genome, computational biology has helped create accurate models of the human brain, map the 3D structure of genomes, and model biological systems. Much of the original progress in computational biology emerged from the United States and Western Europe, due to their large computational infrastructures. Recent decades have seen growing contributions from less-wealthy nations, however. For example, Colombia has had an international computational biology effort since 1998, focusing on genomics and disease in nationally-important crops like coffee and potatoes. Poland, similarly, has recently been a leader in biomolecular simulations and macromolecular sequence analysis.

Applications

Anatomy

Computational anatomy is the study of anatomical shape and form at the visible or gross anatomical scale of morphology. It involves the development of computational mathematical and data-analytical methods for modeling and simulating biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such as magnetic resonance imaging, computational anatomy has emerged as a subfield of medical imaging and bioengineering for extracting anatomical coordinate systems at the morpheme scale in 3D.
The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations. The diffeomorphism group is used to study different coordinate systems via coordinate transformations as generated via the Lagrangian and Eulerian velocities of flow from one anatomical configuration in to another. It relates with shape statistics and morphometrics, with the distinction that diffeomorphisms are used to map coordinate systems, whose study is known as diffeomorphometry.

Data and modeling

Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior in biological systems. This entails a more theoretical approach to problems, rather than its more empirically minded counterpart of experimental biology. Mathematical biology draws on discrete mathematics, topology, Bayesian statistics, linear algebra and Boolean algebra.
These mathematical approaches have enabled the creation of databases and other methods for storing, retrieving, and analyzing biological data, a field known as bioinformatics. Usually, this process involves genetics and analyzing genes.
Gathering and analyzing large datasets have made room for growing research fields such as data mining, and computational biomodeling, which refers to building computer models and visual simulations of biological systems. This allows researchers to predict how such systems will react to different environments, which is useful for determining if a system can "maintain their state and functions against external and internal perturbations". While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe this will be essential in developing modern medical approaches to creating new drugs and gene therapy. A useful modeling approach is to use Petri nets via tools such as esyN.
Along similar lines, until recent decades theoretical ecology has largely dealt with analytic models that were detached from the statistical models used by empirical ecologists. However, computational methods have aided in developing ecological theory via simulation of ecological systems, in addition to increasing application of methods from computational statistics in ecological analyses.

Systems biology

Systems biology consists of computing the interactions between various biological systems ranging from the cellular level to entire populations with the goal of discovering emergent properties. This process usually involves networking cell signaling and metabolic pathways. Systems biology often uses computational techniques from biological modeling and graph theory to study these complex interactions at cellular levels.

Evolutionary biology

Computational biology has assisted evolutionary biology by:
Computational genomics is the study of the genomes of cells and organisms. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual patient. This opens the possibility of personalized medicine, prescribing treatments based on an individual's pre-existing genetic patterns. Researchers are looking to sequence the genomes of animals, plants, bacteria, and all other types of life.
One of the main ways that genomes are compared is by sequence homology. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common ancestor. Research suggests that between 80 and 90% of genes in newly sequenced prokaryotic genomes can be identified this way.
Sequence alignment is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing the longest common subsequence of two genes or comparing variants of certain diseases.
An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome. Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as ENCODE and the Roadmap Epigenomics Project.
Understanding how individual genes contribute to the biology of an organism at the molecular, cellular, and organism levels is known as gene ontology. The Gene Ontology Consortium's mission is to develop an up-to-date, comprehensive, computational model of biological systems, from the molecular level to larger pathways, cellular, and organism-level systems. The Gene Ontology resource provides a computational representation of current scientific knowledge about the functions of genes from many different organisms, from humans to bacteria.
3D genomics is a subsection in computational biology that focuses on the organization and interaction of genes within a eukaryotic cell. One method used to gather 3D genomic data is through Genome Architecture Mapping. GAM measures 3D distances of chromatin and DNA in the genome by combining cryosectioning, the process of cutting a strip from the nucleus to examine the DNA, with laser microdissection. A nuclear profile is simply this strip or slice that is taken from the nucleus. Each nuclear profile contains genomic windows, which are certain sequences of nucleotides - the base unit of DNA. GAM captures a genome network of complex, multi enhancer chromatin contacts throughout a cell.

Biomarker Discovery

Computational biology also plays a pivotal role in identifying biomarkers for diseases such as cardiovascular conditions. By integrating various 'Omic' data - such as genomics, proteomics, and metabolomics - researchers can uncover potential biomarkers that aid in disease diagnosis, prognosis, and treatment strategies. For instance, metabolomic analyses have identified specific metabolites capable of distinguishing between coronary artery disease and myocardial infarction, thereby enhancing diagnostic precision.

Neuroscience

Computational neuroscience is the study of brain function in terms of the information processing properties of the nervous system. A subset of neuroscience, it looks to model the brain to examine specific aspects of the neurological system. Models of the brain include:
  • Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for error. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement.
  • Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific physical property of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model.
It is the work of computational neuroscientists to improve the algorithms and data structures currently used to increase the speed of such calculations.
Computational neuropsychiatry is an emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved in mental disorders. Several initiatives have demonstrated that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions.