Digital pathology

Digital pathology is a sub-field of pathology that focuses on managing and analyzing information generated from digitized specimen slides. It utilizes computer-based technology and virtual microscopy to view, manage, share, and analyze digital slides on computer monitors. This field has applications in diagnostic medicine and aims to achieve more efficient and cost-effective diagnoses, prognoses, and disease predictions through advancements in machine learning and artificial intelligence in healthcare.

History

The roots of digital pathology trace back to the 1960s with early telepathology experiments. The concept of virtual microscopy emerged in the 1990s across various areas of life science research. At the turn of the century the scientific community more and more agreed on the term "digital pathology" to denote digitization efforts in pathology. However, in 2000, the technical requirements were still a limiting factor for a broad dissemination of digital pathology concepts. This changed as new powerful and affordable scanner technology as well as mass / cloud storage technologies appeared on the market. The field of radiology has undergone the digital transformation almost 15 years ago, not because radiology is more advanced, but there are fundamental differences between digital images in radiology and digital pathology: The image source in radiology is the patient, and today in most cases, the image is even primarily captured in digital format. In pathology the scanning is done from preserved and processed specimens, for retrospective studies even from slides stored in a biobank. Besides this difference in pre-analytics and metadata content, the required storage in digital pathology is two to three orders of magnitude higher than in radiology. However, the advantages anticipated through digital pathology are similar to those in radiology:

Capability to transmit digital slides over distances quickly, which enables telepathology scenarios.
Capability to access past specimen from the same patients and similar cases for comparison and review, with much less effort than retrieving slides from the archive shelves.
Capability to compare different areas of multiple slides simultaneously with the help of a virtual microscope.
Capability to annotate areas directly in the slide and share this for teaching and research.

Digital pathology is today widely used for educational purposes in telepathology and teleconsultation as well as in research projects. Digital pathology allows to share and annotate slides in a much easier way and to download annotated lecture sets generates new opportunities for e-learning and knowledge sharing in pathology. Digital pathology in diagnostics is an emerging and upcoming field.

Environment

Scan

Digital slides are created from glass slides using specialized scanning machines. All high quality scans must be free of dust, scratches, and other obstructions. There are two common methods for digital slide scanning, tile-based scanning and line-based scanning. Both technologies use an integrated camera and a motorized stage to move the slide around while parts of the tissue are imaged. Tile scanners capture square field-of-view images covering the entire tissue area on the slide, while line-scanners capture images of the tissue in long, uninterrupted stripes rather than tiles. In both cases, software associated with the scanner stitch the tiles or lines together into a single, seamless image.
Z-stacking is the scanning of a slide at multiple focal planes along the vertical z-axis.

View

Digital slides are accessible for viewing via a computer monitor and viewing software either locally or remotely via the Internet. An example of an open-source, web-based viewer for this purpose implemented in pure JavaScript, for desktop and mobile, is the OpenSeadragon viewer. QuPath is another such open source software, which is often used for digital pathology applications because it offers a powerful set of tools for working with whole slide images. MIKAIA lite is a free viewer that is frequently used to batch-convert or crop slides, convert annotation formats, create annotations, or export tiles with masks for creating AI training datasets. OpenSlide, on the other hand is a C library that provides a simple interface to read and view whole-slide images.

Manage

Digital slides are maintained in an information management system that allows for archival and intelligent retrieval.

Network

Digital slides are often stored and delivered over the Internet or private networks, for viewing and consultation.

Analyze

Image analysis tools are used to derive objective quantification measures from digital slides. Image segmentation and classification algorithms, often implemented using deep neural networks, are used to identify medically significant regions and objects on digital slides. A GPU acceleration software for pathology imaging analysis, cross-comparing spatial boundaries of a huge amount of segmented micro-anatomic objects has been developed. The core algorithm of PixelBox in this software has been adopted in Fixstars' Geometric Performance Primitives library as a part of NVIDIA Developer, which is a production geometry engine for advanced graphical information systems, electronic design automation, computer vision and motion planning solutions.

Simplified example of training a neural network in cytologic object detection: The network is trained by multiple images that are known to depict benign cells and cancer cells, which are correlated with "nodes" that represent visual aspects, in this case nuclear size and chromatin pattern. The benign cells match with small nuclei and finely granular chromatin, whereas most cancer cells match with large nuclei and coarsely granular chromatin. However, the instance of a cancer cell with fine chromatin creates a weakly weighted association between them.

Subsequent run of the network on an input image : The network correctly detects the benign cell. However, the weakly weighted association between fine chromatin and cancer cells also confers a weak signal to the latter from one of two intermediate nodes. In addition, a blood vessel that was not included in the training partially conforms to the patterns of large nuclei and coarse chromatin, and therefore results in weak signals for the cancer cell output. These weak signals may result in a false positive result for a cancer cell.

Integrate

Digital pathology workflow is integrated into the institution's overall operational environment. Slide digitization is expected to reduce the number of routine, manually reviewed slides, maximizing workload efficiency.

Sharing

Digital pathology also allows internet information sharing for education, diagnostics, publication and research. This may take the form of publicly available datasets or open source access to machine learning algorithms.

Digital Slide Files

Digital pathology relies fundamentally on digital slide files-also known as whole-slide images -that encapsulate high-resolution representations of entire microscope slides. These files enable remote diagnosis, computational analysis, education, and archiving at a scale and flexibility impossible with traditional glass slides. The technical design of such file formats has implications for interoperability, performance, long-term data stewardship, and downstream analytical workflows. Broadly, digital slide file formats fall into two major categories: proprietary formats developed by hardware vendors for their scanners, and interoperable formats engineered to facilitate cross-platform compatibility and open data exchange.

Proprietary Formats

Proprietary digital slide formats are developed by hardware vendors to optimize performance and functionality for their specific scanning systems. These formats typically extend standard image containers with custom metadata structures, compression schemes, and organizational paradigms tailored to each manufacturer's technological approach. Vendor formats are designed to optimize feature sets of their native ecosystems and they present challenges for long-term data preservation, cross-platform compatibility, and vendor-neutral analysis workflows. These challenges include vendor lock-in scenarios where institutions become dependent on specific hardware and software ecosystems, difficulties in migrating data between different platforms, and increased costs for maintaining multiple proprietary toolchains.

SVS (Aperio)

The SVS format, developed by Aperio, is one of the most widely used digital slide formats in clinical and research pathology. SVS files are based on the TIFF image standard, extended to support the multi-resolution image pyramids. The format supports multiple image resolutions within a single file, with each level stored as a tiled image. The base image is always the full-resolution capture. Subsidiary images represent downsampled overviews, a thumbnail, and optionally a macro image or a scanned label of the glass slide.

NDPI (Hamamatsu)

NDPI is Hamamatsu’s proprietary TIFF-based whole-slide imaging format, combining standard multi-directory TIFF pyramids with custom extensions for random access viewing and metadata handling. The format embeds JPEG-compressed strips within TIFF IFDss, uses private tag ranges for offset catalogs and restart markers, and places the macro overview in the final directory—all without separate index files. Multi-resolution TIFF pyramid Separate IFDs represent each zoom level; the lowest-resolution overview resides in the last directory. JPEG-compressed strips with restart markers Image data is stored as JPEG-compressed strips. Restart markers enable robust, random-access decoding of individual strips. Private TIFF tags Hamamatsu reserves custom tags to record strip offsets, high-order offset bits, restart-marker catalogs, and slide-specific metadata such as scan parameters.