Genome architecture mapping

In molecular biology, genome architecture mapping is a cryosectioning method to map colocalized DNA regions in a ligation independent manner. It overcomes some limitations of Chromosome conformation capture, as these methods have a reliance on digestion and ligation to capture interacting DNA segments. GAM is the first genome-wide method for capturing three-dimensional proximities between any number of genomic loci without ligation.
The sections that are found using the cryosectioning method mentioned above are referred to as nuclear profiles. The information that they provide relates to their coverage across a genome. A large set of values can be produced that represents the strength of nuclear profiles' presence within a genome. Based on how large or small the coverage across a genome is, judgements can be made involving chromatin interactions, nuclear profile location within the nucleus being cryosectioned, and chromatin compaction levels.
To be able to visualize this information, certain methods can be implemented using the raw data given by a table that shows whether or not nuclear profiles are detected in a genomic window, the genomic windows being represented within a certain chromosome. With a 1 representing a detection within a window and a 0 representing no detection, subsets of data can be obtained and interpreted by creating graphs, charts, heatmaps, and other visualization methods that allow these subsets to be seen in ways other than binary detection methods. By using a more graphic approach to interpreting the data obtained with cryosectioning, it is possible to see interactions that would have otherwise not been seen before.
Some examples of how these visuals can be interpreted include bar graphs that show the radial position and chromatin compaction levels of nuclear profiles, they can be split into categories to give a generalization of how often nuclear profiles are detected within a genomic window. A radar chart is a circular graph that represents the percentages of occurrence within a number of variables. In the sense of genomic information, radar charts can be used to show how genomic windows are represented within "features" of the genome that are part of certain regions that make it up. These charts can be made to compare groups of nuclear profiles with each other and their differences in how they occur within these features is shown graphically. Heatmaps are another form of visual representation where individual values in a table are shown by cells that take on different colors based on their value. This allows for trends to be seen within a table by the display of groups of similar colors or the lack of.
The heatmap to the right represents the relationship between nuclear profiles based on a calculated Jaccard Index where the values ranging from 0-1 are the degree of similarity between two nuclear profiles. Showing this similarity can help to display where certain groups of nuclear profiles are more common within a genome. In this heatmap the diagonal white line of cells is expected because these cells indicate where nuclear profiles intersect themselves and are therefore the most similar as possible to each other, which gives them a value of 1. In addition to the white diagonal line of cells, a cluster of other lightly colored cells can be observed in the bottom right of the heatmap. This grouping of nuclear profiles display high similarity using the Jaccard Index. This means that the nuclear profiles are present in a greater number of genomic windows than others.
The bar graph to the right represents the percentage of nuclear profiles that belong to a category of radial position. The cluster of nuclear profiles was calculated based on their similarity to each other using a k-means clustering method. To begin the process, three nuclear profiles were chosen at random as the 'centers' of the cluster. After the centers were chosen at random, every other nuclear profile is assigned to a cluster based on its distance from each center using a calculated distance value. New centers were then chosen to better represent the cluster. This process was repeated until the centers at the start matched the centers at the end. When the cluster centers have not changed, it could be interpreted that this means proper clusters have been chosen. Within each of these clusters the nuclear profiles are then given a value from 1 to 5 based on their radial position and this data is fed into a bar graph to give a visualization.
This radar chart to the right shows 3 clusters of nuclear profiles' percentage of occurrence within certain features of the mouse genome. Each cluster of nuclear profiles was calculated using the k-means clustering technique described above, relating to the bar graph showing radial positions of nuclear profiles. Comparisons can be made between the clusters and how they show up more or less in certain features in contrast to each other. To calculate a cluster's presence within a certain feature, it is determined if a nuclear profile is present within a window that is detected within a feature. The percentage of how often nuclear profiles within a cluster occur within the same windows that are detected within a feature are then displayed by the radar chart.

Cryosection and laser microdissection

Cryosections are produced according to the Tokuyasu method, involving stringent fixation to preserve nuclear and cellular architecture, cryoprotection with a sucrose-PBS solution, before freezing in liquid nitrogen. In Genome Architecture Mapping, sectioning is a necessary step for exploring the 3D topology of the genome, before Laser Microdissection. Then laser microdissection can isolate each nuclear profile, before DNA extraction and sequencing.

Data analysis - bioinformatic tools

GAMtools

GAMtools is a collection of software utilities for Genome Architecture Mapping data developed by Robert Beagrie. Bowtie2 is required before running GAMtools. The input required for this program is in Fastq format. This software has a variety of features and the exact commands to use will depend on what you want to do with it, however most features require generating segregation table, so for most users the first steps to take will be to download or create input data, and perform the sequence mapping. This will generate a segregation table, which can then be used to perform various other operations which are outlined below. For further information, view the GAMtools documentation.

Mapping the sequencing data

The GAMtools command can be used to perform the mapping. It maps the raw sequence data from the nuclear profiles. GAMtools also provides the option to perform quality control checks on the NPs. This option can be enabled by adding the flag -c/--do-qc to the previous command. When the quality control check is enabled, GAMtools will try to exclude poor quality nuclear profiles.
The GAMtools command gamtools process_nps is used to map raw sequence data from nuclear profiles and generate a segregation table. Quality control can be enabled with the -c or --do-qc flag to exclude low-quality NPs.
The GAMtools command for this step is:


gamtools process_nps --do-qc -g

Windows calling and segregation table

After mapping, GAMtools counts the number of reads from each nuclear profile that overlap with genomic windows, using a default window size of 50 kb. This step is performed by the same process_nps command and results in the generation of a segregation table, which indicates the presence or absence of each window across all profiles.

Producing proximity matrices

The GAMtools command for this process is . The input file is the segregation table that was calculated from the windows calling step. GAMtools calculates these matrices using the normalized linkage disequilibrium, which means that it looks at how many times each pair of windows are detected by the same NP, and then normalizes the results based on how many times each window was detected across all NPs. The figure below shows an example of a proximity matrix heatmap produced using GAMtools.
The GAMtools command for this step is:


gamtools matrix -s  -r

Calculating chromatin compaction

The GAMtools command can be used to calculate an estimation of chromatin compaction. Compaction is a value assigned to a gene that represents how large the gene is. The level of compaction is inversely proportional to the locus volume. Genomic loci with a low volume are said to have a high level of compaction, and loci with a high volume have a low level of compaction. As shown in the figure, loci with a low compaction level are expected to be intersected more often by the cryosection slices. GAMtools uses this information to assign a compaction value to each locus based on its detection frequency across many nuclear profiles. The compaction rate of these loci is not static, and will continually change throughout the life of the cell. Genomic loci are thought to be de-compacted when that gene is active. This allows a researcher to make assumptions about which genes are currently active in a cell, using the results of the GAMtools data. A locus with low compaction is also thought to be related to transcriptional activity. The time-complexity of the compaction command is O, where is the number of genomic windows, and is the number of nuclear profiles.
The GAMtools command for this step is:


gamtools compaction -s  -o

Calculating radial position

GAMtools can be used to calculate the radial position of NPs. The radial position of an NP is a measure of how near or far that NP is from the equator or center of the nucleus. NPs that are close to the center of the nucleus are considered equatorial whereas NPs that are closer to the edge of the nucleus are considered apical. The GAMtools command to calculate radial positioning is . This requires that you have previously generated a segregation table. The radial position is estimated from the average size of NPs that contain a given chromatin region. Chromatin that are closer to the periphery will typically be intersected by smaller, more apical NPs, whereas central chromatin will be intersected by larger, equatorial NPs.
In order to estimate the size of each NP, GAMtools looks at the number of windows each NP saw, as NPs that saw more windows can be assumed to be larger in volume. This is very similar to the method used to estimate chromatin compaction. The figure to the right illustrates how GAMtools looks at each NP's detection rate to estimate the volume, in order to determine the compaction or the radial position. If we look at the first NP, we see that it intersects all three windows, so we can estimate that it is one of the largest NPs. The second NP intersects two out of the three windows, so we can estimate that it is smaller than the first NP. The third NP only intersects one out of the three windows, so we can estimate that it is the smallest NP. Now that we have an estimation of the size of each NP, we can estimate the radial position. If we assume that the larger NPs are more equatorial, then we find that the first NP is the most equatorial, the second NP is the second most equatorial, and the third NP is the most apical.
The GAMtools command for this step is:


gamtools radial_pos -s  -o

Here is some pseudocode that illustrates how one might calculate the radial position of a list of NPs:


// Suppose we have a 2D matrix called data where the rows correspond to the NPs and the columns correspond to the windows, so if data is 1, then that means NP 1 contains window 2
// Use this variable to keep track of the largest number of windows detected by a single NP
LET MAXWINDOW = 0
// Use this array to keep track of the number of windows detected by each NP, so we can later determine the radial position
LET RADIAL_POS = 
// Loop through all NPs
FOR NP FROM 1 TO NUM_NPS:
 LET WINCOUNT = 0
 // Count the number of windows each NP saw
 FOR WIN FROM 1 to NUM_WINDOWS:
 IF 
 WINCOUNT = WINCOUNT + 1
 // See if the current NP has seen the most windows
 IF WINCOUNT > MAXWINDOW:
 MAXWINDOW = WINCOUNT
 // Add the count for the current NP to the array
 RADIAL_POS.APPEND
// Divide the number of windows each NP saw by the largest number of windows any NP saw to get an estimate of the radial position
FOR NP FROM 1 TO NUM_NPS:
 RADIAL_POS = RADIAL_POS / MAXWINDOW

This pseudocode will create a list of radial positions that range from 0 - 1 that provide an estimation of the radial position, where 1 is the most equatorial and 0 is the most apical. The time complexity of this pseudocode is O, where n is the number of NPs and m is the number of windows. The first for loop goes through n iterations, and it has an inner for loop which goes through m iterations, which means the time complexity of that for loop is O. The second for loop has n iterations, so it has time complexity O. Therefore, the overall time complexity of this code is O, which can be reduced to O.