Image segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of contours extracted from the image. Each of the pixels in a region are similar with respect to some characteristic or computed property, such as color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic. When applied to a stack of images, typical in medical imaging, the resulting contours after image segmentation can be used to create 3D reconstructions with the help of geometry reconstruction algorithms like marching cubes.
Applications
Some of the practical applications of image segmentation are:- Content-based image retrieval
- Machine vision
- Medical imaging, and imaging studies in biomedical research, including volume rendered images from computed tomography, magnetic resonance imaging, as well as volume electron microscopy techniques such as FIB-SEM.
- * Locate tumors and other pathologies
- * Measure tissue volumes
- * Diagnosis, study of anatomical structure
- * Surgery planning
- * Virtual surgery simulation
- * Intra-surgery navigation
- * Radiotherapy
- * Digital Pathology and Histopathology. Nuclei instance segmentation in variously stained whole slide images refers to the automatic delineation of individual cells' nuclei borders. This task is a specialized subproblem of instance segmentation, with high practical importance in biomedical research and diagnostics. The modern approach to the problem is the use of deep learning architectures developed to encompass the main challenges of separating overlapping or fused nuclei. The availability of few humanly-annotated datasets poses another challenge to the problem.
- Object detection
- * Pedestrian detection
- * Face detection
- * Brake light detection
- * Locate objects in satellite images
- Recognition Tasks
- * Face recognition
- * Fingerprint recognition
- * Iris recognition
- * Prohibited Item at Airport security checkpoints
- Traffic control systems
- Video surveillance
- Video object co-segmentation and action localization
Classes of segmentation techniques
There are two classes of segmentation techniques.- Classical computer vision approaches
- AI based techniques
Groups of image segmentation
- Semantic segmentation is an approach detecting, for every pixel, the belonging class. For example, in a figure with many people, all the pixels belonging to persons will have the same class id and the pixels in the background will be classified as background.
- Instance segmentation is an approach that identifies, for every pixel, the specific belonging instance of the object. It detects each distinct object of interest in the image. For example, when each person in a figure is segmented as an individual object.
- Panoptic segmentation combines both semantic and instance segmentation. Like semantic segmentation, panoptic segmentation is an approach that identifies, for every pixel, the belonging class. Moreover, like in instance segmentation, panoptic segmentation distinguishes different instances of the same class.
Thresholding
The key of this method is to select the threshold value. Several popular methods are used in industry including the maximum entropy method, balanced histogram thresholding, Otsu's method, and k-means clustering.
Recently, methods have been developed for thresholding computed tomography images. The key idea is that, unlike Otsu's method, the thresholds are derived from the radiographs instead of the image.
New methods suggest the use of multi-dimensional, fuzzy rule-based, non-linear thresholds. In these approaches, the decision regarding each pixel's membership in a segment is based on multi-dimensional rules derived from fuzzy logic and evolutionary algorithms, considering factors such as image lighting, environment, and application.
Clustering methods
The K-means algorithm is an iterative technique that is used to partition an image into K clusters. The basic algorithm is- Pick K cluster centers, either randomly or based on some heuristic method, for example K-means++
- Assign each pixel in the image to the cluster that minimizes the distance between the pixel and the cluster center
- Re-compute the cluster centers by averaging all of the pixels in the cluster
- Repeat steps 2 and 3 until convergence is attained
The Mean Shift algorithm is a technique that is used to partition an image into an unknown apriori number of clusters. This has the advantage of not having to start with an initial guess of such parameter which makes it a better general solution for more diverse cases.
Motion and interactive segmentation
Motion based segmentation is a technique that relies on motion in the image to perform segmentation.The idea is simple: look at the differences between a pair of images. Assuming the object of interest is moving, the difference will be exactly that object.
Improving on this idea, Kenney et al. proposed interactive segmentation . They use a robot to poke objects in order to generate the motion signal necessary for motion-based segmentation.
Interactive segmentation follows the interactive perception framework proposed by Dov Katz and Oliver Brock .
Another technique that is based on motion is rigid motion segmentation.
Compression-based methods
Compression based methods postulate that the optimal segmentation is the one that minimizes, over all possible segmentations, the coding length of the data. The connection between these two concepts is that segmentation tries to find patterns in an image and any regularity in the image can be used to compress it. The method describes each segment by its texture and boundary shape. Each of these components is modeled by a probability distribution function and its coding length is computed as follows:- The boundary encoding leverages the fact that regions in natural images tend to have a smooth contour. This prior is used by Huffman coding to encode the difference chain code of the contours in an image. Thus, the smoother a boundary is, the shorter coding length it attains.
- Texture is encoded by lossy compression in a way similar to minimum description length principle, but here the length of the data given the model is approximated by the number of samples times the entropy of the model. The texture in each region is modeled by a multivariate normal distribution whose entropy has a closed form expression. An interesting property of this model is that the estimated entropy bounds the true entropy of the data from above. This is because among all distributions with a given mean and covariance, normal distribution has the largest entropy. Thus, the true coding length cannot be more than what the algorithm tries to minimize.
Histogram-based methods
-based methods are very efficient compared to other image segmentation methods because they typically require only one pass through the pixels. In this technique, a histogram is computed from all of the pixels in the image, and the peaks and valleys in the histogram are used to locate the clusters in the image. Color or intensity can be used as the measure.A refinement of this technique is to recursively apply the histogram-seeking method to clusters in the image in order to divide them into smaller clusters. This operation is repeated with smaller and smaller clusters until no more clusters are formed.
One disadvantage of the histogram-seeking method is that it may be difficult to identify significant peaks and valleys in the image.
Histogram-based approaches can also be quickly adapted to apply to multiple frames, while maintaining their single pass efficiency. The histogram can be done in multiple fashions when multiple frames are considered. The same approach that is taken with one frame can be applied to multiple, and after the results are merged, peaks and valleys that were previously difficult to identify are more likely to be distinguishable. The histogram can also be applied on a per-pixel basis where the resulting information is used to determine the most frequent color for the pixel location. This approach segments based on active objects and a static environment, resulting in a different type of segmentation useful in video tracking.