Multiomics


Multiomics, multi-omics, integrative omics, "panomics" or "pan-omics" is a biological analysis approach in which the data consists of multiple "omes", such as the genome, epigenome, transcriptome, proteome, metabolome, exposome, and microbiome ; in other words, the use of multiple omics technologies to study life in a concerted way. By combining these "omes", scientists can analyze complex biological big data to find novel associations between biological entities, pinpoint relevant biomarkers and build elaborate markers of disease and physiology. In doing so, multiomics integrates diverse omics data to find a coherently matching geno-pheno-envirotype relationship or association. The OmicTools service lists more than 99 pieces of software related to multiomic data analysis, as well as more than 99 databases on the topic.
Systems biology approaches are often based upon the use of multiomic analysis data. The American Society of Clinical Oncology defines panomics as referring to "the interaction of all biological functions within a cell and with other body functions, combining data collected by targeted tests... and global assays with other patient-specific information."

History

The publication of the structure of DNA by Francis Crick and James Watson on April 25, 1953 marked a turning point in the study of genomics. It can be seen as the historical starting point for the development of cross-disciplinary and cross-cutting studies in the future field of biotechnology. These advances led to the creation of multiomics as a discipline at the end of the 20th century.
The reading and study of the transcriptome, beginning in the 1980s, gradually made it possible to identify and quantify the products of gene expression in a cell or tissue in a given environment; this marked the beginning of a “meta” view of the effects of gene expression in the human organism.
A clear increase in the number of publications including multiomics—in their methodology or subject matter—appeared in the late 2000s, with the number rising from zero in 2000 to more than 1400 per year in 2021, growing exponentially.

Combined multiomic data collection

Combined multiomic data collection approaches have evolved to address the limitations of traditional multiomics research, which typically requires separate sample processing for different molecular classes then subsequent computational integration, introducing variability and increasing costs. Early advances in this field include sequential extraction, TRIzol-based sequential isolation methods, which demonstrated that a reagent traditionally used for RNA isolation could simultaneously extract DNA, RNA, proteins, metabolites, and lipids from a single sample. Similar approaches like the Metabolite, Protein, and Lipid extraction and the "Three-in-One" method adapted biphasic fractionation to extract proteins, metabolites, and lipids for LC-MS/MS analysis. More recent technological developments include the Multi-Omic Single-Shot Technology, which integrates proteome and lipidome analysis in a single LC-MS run using one reverse-phase column and a binary mobile phase system, and the Bead-enabled Accelerated Monophasic Multi-omics method that combines n-butanol-based monophasic extraction with magnetic beads and accelerated protein digestion for the separate analysis of metabolites, lipids, and proteins. One of the most comprehensive technologies in this space is Dalton Bioanalytics Inc.'s Omni-MS®, a multiomic assay that uses its proprietary method to simultaneously profile proteins, lipids, electrolytes, metabolites, and other small molecules in a single preparation and single LC-MS analysis. This platform has been applied to biomarker discovery, identifying potential biomarkers across multiple molecular classes and across various conditions and diseases including COVID severity during pregnancy, 22q11.2 deletion syndrome, and hereditary angioedema. These integrated approaches significantly reduce sample requirements, processing time, and technical variation while improving correlation analysis across different molecular classes, making them increasingly valuable for precision medicine and systems biology research.

Single-cell multiomics

A branch of the field of multiomics is the analysis of multilevel single-cell data, called single-cell multiomics. This approach gives us an unprecedented resolution to look at multilevel transitions in health and disease at the single cell level. An advantage in relation to bulk analysis is to mitigate confounding factors derived from cell to cell variation, allowing the uncovering of heterogeneous tissue architectures.
Methods for parallel single-cell genomic and transcriptomic analysis can be based on simultaneous amplification or physical separation of RNA and genomic DNA. They allow insights that cannot be gathered solely from transcriptomic analysis, as RNA data do not contain non-coding genomic regions and information regarding copy-number variation, for example. An extension of this methodology is the integration of single-cell transcriptomes to single-cell methylomes, combining single-cell bisulfite sequencing to single cell RNA-Seq. Other techniques to query the epigenome, as single-cell ATAC-Seq and single-cell Hi-C also exist.
A different, but related, challenge is the integration of proteomic and transcriptomic data. One approach to perform such measurement is to physically separate single-cell lysates in two, processing half for RNA, and half for proteins. The protein content of lysates can be measured by proximity extension assays, for example, which use DNA-barcoded antibodies. A different approach uses a combination of heavy-metal RNA probes and protein antibodies to adapt mass cytometry for multiomic analysis.
Related to Single-cell multiomics is the field of Spatial Omics which assays tissues through omics readouts that preserve the relative spatial orientation of the cells in the tissue. The number of Spatial Omics methods published still lags behind the number of methods published for Single-Cell multiomics, but the numbers are catching up.

Multiomics and machine learning

Concurrent with the rapid evolution of high-throughput biology, machine learning applications in biomedical data analysis have seen exponential growth. The synergy between multi-omics data integration and machine learning has been pivotal in the discovery of novel biomarkers.
Established statistical frameworks have laid the groundwork for this analysis. For instance, the mixOmics project utilizes sparse Partial Least Squares regression to select features and identify putative biomarkers. Similarly, Regularized Generalized Canonical Correlation Analysis offers a unified and flexible framework for integrating heterogeneous data, available via the RGCCA R package.
Building on these foundations, recent advances have introduced more sophisticated latent variable models. Multi Omics Factor Analysis, has emerged as a powerful tool for disentangling sources of variation across diverse data modalities. This approach is further extended by MEFISTO which incorporates functional structures to account for temporal or spatial covariates in the data. Furthermore, the integration of deep learning has led to methods such as Deep Latent Variable Path Modelling, which captures complex, non-linear dependencies to refine the identification of biomarkers and is able to integrate multiomics with unstructured data such as complex images.

Multiomics in health and disease

Multiomics currently holds a promise to fill gaps in the understanding of human health and disease, and many researchers are working on ways to generate and analyze disease-related data. The applications range from understanding host-pathogen interactions and infectious diseases, cancer, to understanding better chronic and complex non-communicable diseases and improving personalized medicine.

Integrated Human Microbiome Project

The second phase of the $170 million Human Microbiome Project was focused on integrating patient data to different omic datasets, considering host genetics, clinical information and microbiome composition. The phase one focused on characterization of communities in different body sites. Phase 2 focused in the integration of multiomic data from host & microbiome to human diseases. Specifically, the project used multiomics to improve the understanding of the interplay of gut and nasal microbiomes with type 2 diabetes, gut microbiomes and inflammatory bowel disease and vaginal microbiomes and pre-term birth.

Systems immunology

The complexity of interactions in the human immune system has prompted the generation of a wealth of immunology-related multi-scale omic data. Multi-omic data analysis has been employed to gather novel insights about the immune response to infectious diseases, such as pediatric chikungunya, as well as noncommunicable autoimmune diseases. Integrative omics has also been employed strongly to understand effectiveness and side effects of vaccines, a field called systems vaccinology. For example, multiomics was essential to uncover the association of changes in plasma metabolites and immune system transcriptome on response to vaccination against herpes zoster.

List of software used for multi-omic analysis

The Bioconductor project curates a variety of R packages aimed at integrating omic data:
  • , for multiple co-inertia analysis of multi omic datasets
  • , offering a bioconductor interface for overlapping samples
  • , a package focused on using multi omic data for evaluating alternative splicing
  • , a package for visualization of multiomic cancer data
  • , a suite of multivariate methods for data integration
  • , a package for encapsulating multiple data sets
The implements a versatile framework for data integration. This package is freely available on the .
The OmicTools database further highlights R packages and othertools for multi omic data analysis:
  • , a web resource for visualization of multi-omics datasets
  • SIGMA, a Java program focused on integrated analysis of cancer datasets
  • iOmicsPASS, a tool in C++ for multiomic-based phenotype prediction
  • , an R graphical interface for visualization of multiomic data
  • , a framework in Python for reproducibly automating multiomic data analysis