Optical mapping

Last updated

Optical mapping [1] is a technique for constructing ordered, genome-wide, high-resolution restriction maps from single, stained molecules of DNA, called "optical maps". By mapping the location of restriction enzyme sites along the unknown DNA of an organism, the spectrum of resulting DNA fragments collectively serves as a unique "fingerprint" or "barcode" for that sequence. Originally developed by Dr. David C. Schwartz and his lab at NYU in the 1990s [2] this method has since been integral to the assembly process of many large-scale sequencing projects for both microbial and eukaryotic genomes. Later technologies use DNA melting, [3] DNA competitive binding [4] or enzymatic labelling [5] [6] in order to create the optical mappings.

Contents

Technology

The optical mapping workflow Optical mapping.jpg
The optical mapping workflow

The modern optical mapping platform works as follows: [7]

  1. Genomic DNA is obtained from lysed cells, and randomly sheared to produce a "library" of large genomic molecules for optical mapping.
  2. A single molecule of DNA is stretched (or elongated) and held in place on a slide under a fluorescent microscope due to charge interactions.
  3. The DNA molecule is digested by added restriction enzymes, which cleave at specific digestion sites. The resulting molecule fragments remain attached to the surface. The fragment ends at the cleavage sites are drawn back (due to elasticity of linearized DNA), leaving gaps which are identifiable under the microscope.
  4. DNA fragments stained with intercalating dye are visualized by fluorescence microscopy and are sized by measuring the integrated fluorescence intensity. This produces an optical map of single molecules.
  5. Individual optical maps are combined to produce a consensus, genomic optical map.

History of optical mapping platform

Early system

DNA molecules were fixed on molten agarose developed between a cover slip and a microscope slide. Restriction enzyme was pre-mixed with the molten agarose before DNA placement and cleavage was triggered by addition of magnesium.

Using charged surfaces

Rather than being immobilized within a gel matrix, DNA molecules were held in place by electrostatic interactions on a positively charged surface. Resolution improved such that fragments from ~30 kb to as small as 800 bp could sized.

Automated system

This involved the development and integration of an automated spotting system to spot multiple single molecules on a slide (like a microarray) for parallel enzymatic processing, automated fluorescence microscopy for image acquisition, image procession vision to handle images, algorithms for optical map construction, cluster computing for processing large amounts of data

High-throughput system using microfluidics

Observing that microarrays spotted with single molecules did not work well for large genomic DNA molecules, microfluidic devices using soft lithography possessing a series of parallel microchannels were developed.

Next-generation system using nanocoding technology

An improvement on optical mapping, called "Nanocoding", [8] has potential to boost throughput by trapping elongated DNA molecules in nanoconfinements.

Comparisons

Other mapping techniques

The advantage of OM over traditional mapping techniques is that it preserves the order of the DNA fragment, whereas the order needs to be reconstructed using restriction mapping. In addition, since maps are constructed directly from genomic DNA molecules, cloning or PCR artifacts are avoided. However, each OM process is still affected by false positive and negative sites because not all restriction sites are cleaved in each molecule and some sites may be incorrectly cut. In practice, multiple optical maps are created from molecules of the same genomic region, and an algorithm is used to determine the best consensus map. [9]

Other genome analysis methods

There are a variety of approaches to identifying large-scale genomic variations (such as indels, duplications, inversions, translocations) between genomes. Other categories of methods include using microarrays, pulsed-field gel electrophoresis, cytogenetics and paired-end tags.

Uses

Initially, the optical mapping system has been used to construct whole-genome restriction maps of bacteria, parasites, and fungi. [10] [11] [12] It has also been used to scaffold and validate bacterial genomes. [13] To serve as scaffolds for assembly, assembled sequence contigs can be scanned for restriction sites in silico using known sequence data and aligning them to the assembled genomic optical map. Commercial company, Opgen has provided optical mappings for microbial genomes. For larger eukaryotic genomes, only the David C. Schwartz lab (now at Madison-Wisconsin) has produced optical maps for mouse, [14] human, [15] rice, [16] and maize. [17]

Optical sequencing

Optical sequencing is a single molecule DNA sequencing technique that follows sequence-by-synthesis and uses optical mapping technology. [18] [19] Similar to other single molecular sequencing approaches such as SMRT sequencing, this technique analyzes a single DNA molecule, rather than amplify the initial sample and sequence multiple copies of the DNA. During synthesis, fluorochrome-labeled nucleotides are incorporated through the use of DNA polymerases and tracked by fluorescence microscopy. This technique was originally proposed by David C. Schwartz and Arvind Ramanathan in 2003.

Optical sequencing cycle

The following is an overview of each cycle in the optical sequencing process. [20]

The optical sequencing cycle Optical cycle.jpg
The optical sequencing cycle

Step 1: DNA barcoding
Cells are lysed to release genomic DNA. These DNA molecules are untangled, placed onto optical mapping surface containing microfluidic channels and the DNA is allowed to flow through the channels. These molecules are then barcoded by restriction enzymes to allow for genomic localization through the technique of optical mapping. See the above section on "Technology" for those steps.

Step 2: Template nicking
DNase I is added to randomly nick the mounted DNA molecules. A wash is then performed to remove the DNase I. The mean number of nicks that occur per template is dependent on the concentration of DNase I as well as the incubation time.

Step 3: Gap formation
T7 exonuclease is added which uses the nicks in the DNA molecules to expand the gaps in a 5'–3' direction. Amount of T7 exonuclease must be carefully controlled to avoid overly high levels of double-stranded breaks.

Step 4: Fluorochrome incorporation
DNA polymerase is used to incorporate fluorochrome-labelled nucleotides (FdNTPs) into the multiple gapped sites along each DNA molecule. During each cycle, the reaction mixture contains a single type of FdNTP and allows for multiple additions of that nucleotide type. Various washes are then performed to remove unincorporated fdNTPs in preparation for imaging and the next cycle of FdNTP addition.

Step 5: Imaging
This step counts the number of incorporated fluorochrome-labeled nucleotides at the gap regions using fluorescence microscopy.

Step 6: Photobleaching
The laser illumination that is used to excite the fluorochrome is also used here to destroy the fluorochrome signal. This essentially resets the fluorochrome counter, and prepares the counter for the next cycle. This step is a unique aspect of optical sequencing as it does not actually remove the fluorochrome label of the nucleotide after its incorporation. not removing the fluorochrome label makes sequencing more economical, but it results in the need to incorporate fluorochrome labels consecutively which can result in problems due to the bulkiness of the labels.

Step 7: Repeat steps 4–6
Steps 4-6 are repeated with step 4 using a reaction mixture that contains a different fluorochrome-labeled nucleotide (FdNTP) each time. This is repeated until the desired region is sequenced.

Optimization strategies

Selection of an appropriate DNA polymerase is critical to the efficiency of the base addition step and must meet several criteria:

In addition, different polymerase preference for different fluorochromes, linker length on fluorochrome-nucleotides, and buffer compositions are also important factors to be considered to optimize the base addition process and maximize number of consecutive FdNTP incorporations.

Advantages

Single-molecule analysis
Since minimal DNA sample required, time-consuming and costly amplification step is avoided to streamline sample preparation process.

Large DNA molecule templates (~500 kb) vs. Short DNA molecule templates (< 1kb) While most next generation sequencing technologies aim of massive amounts of smalls sequence reads, these small sequence reads make de novo sequencing efforts and genome repeat regions difficult to comprehend. Optical sequencing uses large DNA molecule templates (~500 kb) for sequencing and these offer several advantages over small templates:

  1. These large DNA templates can be "DNA barcoded" to determine their genomic localization with confidence. Therefore, any sequence reads that are taken from the large template can be mapped onto the genome with a high degree of confidence. More importantly, sequence reads from high repeat regions can placed with a greater degree of confidence whereas the short reads suffer from mapping uncertainty in high repeat regions. Special algorithms and software such as optical mapping and nanocoding have been developed to align single-molecule barcodes with a reference genome.
  2. Multiple sequence reads from the same large template molecule. These multiple sequence reads reduce the complexity of de novo assembly, disambiguate genomic rearrangement regions, and "intrinsically free from any assembly errors." [20]
  3. Molecular barcoding of large DNA molecular templates with sequence acquisition provides broad and specific genomic analyses

Disadvantages

Related Research Articles

<span class="mw-page-title-main">Polymerase chain reaction</span> Laboratory technique to multiply a DNA sample for study

The polymerase chain reaction (PCR) is a method widely used to make millions to billions of copies of a specific DNA sample rapidly, allowing scientists to amplify a very small sample of DNA sufficiently to enable detailed study. PCR was invented in 1983 by American biochemist Kary Mullis at Cetus Corporation. Mullis and biochemist Michael Smith, who had developed other essential ways of manipulating DNA, were jointly awarded the Nobel Prize in Chemistry in 1993.

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule.

<span class="mw-page-title-main">Genomics</span> Discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism. Genes may direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis" principle, in which the sequencing is performed by detecting the nucleotide incorporated by a DNA polymerase. Pyrosequencing relies on light detection based on a chain reaction when pyrophosphate is released. Hence, the name pyrosequencing.

<span class="mw-page-title-main">DNA sequencing</span> Process of determining the nucleic acid sequence

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

<span class="mw-page-title-main">Sanger sequencing</span> Method of DNA sequencing developed in 1977

Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies in that it can produce DNA sequence reads of > 500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention's (CDC) CaliciNet surveillance network.

<span class="mw-page-title-main">Gene mapping</span> Process of locating specific genes

Gene mapping or genome mapping describes the methods used to identify the location of a gene on a chromosome and the distances between genes. Gene mapping can also describe the distances between different sites within a gene.

SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. An SNP is a single base pair mutation at a specific locus, usually consisting of two alleles. SNPs are found to be involved in the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are conserved during evolution, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which aims to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing. The increase of interest in SNPs has been reflected by the furious development of a diverse range of SNP genotyping methods.

<span class="mw-page-title-main">Bisulfite sequencing</span> Lab procedure detecting 5-methylcytosines in DNA

Bisulfitesequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA before routine sequencing to determine the pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.

<span class="mw-page-title-main">2 base encoding</span>

2 Base Encoding, also called SOLiD, is a next-generation sequencing technology developed by Applied Biosystems and has been commercially available since 2008. These technologies generate hundreds of thousands of small sequence reads at one time. Well-known examples of such DNA sequencing methods include 454 pyrosequencing, the Solexa system and the SOLiD system. These methods have reduced the cost from $0.01/base in 2004 to nearly $0.0001/base in 2006 and increased the sequencing capacity from 1,000,000 bases/machine/day in 2004 to more than 100,000,000 bases/machine/day in 2006.

Single-molecule real-time (SMRT) sequencing is a parallelized single molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template. The ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase. Each of the four DNA bases is attached to one of four different fluorescent dyes. When a nucleotide is incorporated by the DNA polymerase, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. A detector detects the fluorescent signal of the nucleotide incorporation, and the base call is made according to the corresponding fluorescence of the dye.

Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence. Epigenomic maintenance is a continuous process and plays an important role in stability of eukaryotic genomes by taking part in crucial biological mechanisms like DNA repair. Plant flavones are said to be inhibiting epigenomic marks that cause cancers. Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays.

Cap analysis of gene expression (CAGE) is a gene expression technique used in molecular biology to produce a snapshot of the 5′ end of the messenger RNA population in a biological sample. The small fragments from the very beginnings of mRNAs are extracted, reverse-transcribed to cDNA, PCR amplified and sequenced. CAGE was first published by Hayashizaki, Carninci and co-workers in 2003. CAGE has been extensively used within the FANTOM research projects.

Molecular Inversion Probe (MIP) belongs to the class of Capture by Circularization molecular techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. Probes used in this technique are single stranded DNA molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary to the target in the genome; these probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary segments separated by a linker region. With this design, when the probe hybridizes to the target, it undergoes an inversion in configuration and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale SNP genotyping as well as for studying gene copy alterations and characteristics of specific genomic loci to identify biomarkers for different diseases such as cancer. Key strengths of the MIP technology include its high specificity to the target and its scalability for high-throughput, multiplexed analyses where tens of thousands of genomic loci are assayed simultaneously.

<span class="mw-page-title-main">Ion semiconductor sequencing</span>

Ion semiconductor sequencing is a method of DNA sequencing based on the detection of hydrogen ions that are released during the polymerization of DNA. This is a method of "sequencing by synthesis", during which a complementary strand is built based on the sequence of a template strand.

<span class="mw-page-title-main">DNA nanoball sequencing</span>

DNA nanoball sequencing is a high throughput sequencing technology that is used to determine the entire genomic sequence of an organism. The method uses rolling circle replication to amplify small fragments of genomic DNA into DNA nanoballs. Fluorescent nucleotides bind to complementary nucleotides and are then polymerized to anchor sequences bound to known sequences on the DNA template. The base order is determined via the fluorescence of the bound nucleotides This DNA sequencing method allows large numbers of DNA nanoballs to be sequenced per run at lower reagent costs compared to other next generation sequencing platforms. However, a limitation of this method is that it generates only short sequences of DNA, which presents challenges to mapping its reads to a reference genome. After purchasing Complete Genomics, the Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.

Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation sequencing. Some of these technologies emerged between 1993 and 1998 and have been commercially available since 2005. These technologies use miniaturized and parallelized platforms for sequencing of 1 million to 43 billion short reads per instrument run.

<span class="mw-page-title-main">Illumina dye sequencing</span> DNA sequencing method

Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It was developed by Shankar Balasubramanian and David Klenerman of Cambridge University, who subsequently founded Solexa, a company later acquired by Illumina. This sequencing method is based on reversible dye-terminators that enable the identification of single nucleotides as they are washed over DNA strands. It can also be used for whole-genome and region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

<span class="mw-page-title-main">Denaturation mapping</span>

Denaturation Mapping is a form of optical mapping, first described in 1966. It is used to characterize DNA molecules without the need for amplification or sequencing. It is based on the differences between the melting temperatures of AT-rich and GC-rich regions. Even though modern sequencing methods reduced the need for denaturation mapping, it is still being used for specific purposes, such as detection of large scale structural variants.

BLESS, also known as breaks labeling, enrichment on streptavidin and next-generation sequencing, is a method used to detect genome-wide double-strand DNA damage. In contrast to chromatin immunoprecipitation (ChIP)-based methods of identifying DNA double-strand breaks (DSBs) by labeling DNA repair proteins, BLESS utilizes biotinylated DNA linkers to directly label genomic DNA in situ which allows for high-specificity enrichment of samples on streptavidin beads and the subsequent sequencing-based DSB mapping to nucleotide resolution.

References

  1. Zhou, Shiguo; Jill Herscheleb; David C. Schwartz (2007). A Single Molecule System for Whole Genome Analysis. New high throughput technologies for DNA sequencing and genomics. Vol. 2. Elsevier. pp. 269–304.
  2. Schwartz, D. C., et al. "Ordered Restriction Maps of Saccharomyces Cerevisiae Chromosomes Constructed by Optical Mapping." Science 262.5130 (1993): 110–4.
  3. Reisner, Walter; Larsen, Niels B.; Silahtaroglu, Asli; Kristensen, Anders; Tommerup, Niels; Tegenfeldt, Jonas O.; Flyvbjerg, Henrik (2010-07-27). "Single-molecule denaturation mapping of DNA in nanofluidic channels". Proceedings of the National Academy of Sciences. 107 (30): 13294–13299. Bibcode:2010PNAS..10713294R. doi: 10.1073/pnas.1007081107 . ISSN   0027-8424. PMC   2922186 . PMID   20616076.
  4. Nilsson, Adam N.; Emilsson, Gustav; Nyberg, Lena K.; Noble, Charleston; Stadler, Liselott Svensson; Fritzsche, Joachim; Moore, Edward R. B.; Tegenfeldt, Jonas O.; Ambjörnsson, Tobias (2014-09-02). "Competitive binding-based optical DNA mapping for fast identification of bacteria - multi-ligand transfer matrix theory and experimental applications on Escherichia coli". Nucleic Acids Research. 42 (15): e118. doi:10.1093/nar/gku556. ISSN   0305-1048. PMC   4150756 . PMID   25013180.
  5. Grunwald, Assaf; Dahan, Moran; Giesbertz, Anna; Nilsson, Adam; Nyberg, Lena K.; Weinhold, Elmar; Ambjörnsson, Tobias; Westerlund, Fredrik; Ebenstein, Yuval (2015-10-15). "Bacteriophage strain typing by rapid single molecule analysis". Nucleic Acids Research. 43 (18): e117. doi:10.1093/nar/gkv563. ISSN   0305-1048. PMC   4605287 . PMID   26019180.
  6. Vranken, Charlotte; Deen, Jochem; Dirix, Lieve; Stakenborg, Tim; Dehaen, Wim; Leen, Volker; Hofkens, Johan; Neely, Robert K. (2014-04-01). "Super-resolution optical DNA Mapping via DNA methyltransferase-directed click chemistry". Nucleic Acids Research. 42 (7): e50. doi:10.1093/nar/gkt1406. ISSN   0305-1048. PMC   3985630 . PMID   24452797.
  7. Dimalanta, E.T. et al. A microfluidic system for large DNA molecule arrays. Anal. Chem. 76 (2004): 5293–5301.
  8. Jo, K., et al. "A Single-Molecule Barcoding System using Nanoslits for DNA Analysis." Proceedings of the National Academy of Sciences of the United States of America 104.8 (2007): 2673–8.
  9. Valouev, A., Schwartz, D., Zhou, S., and Waterman, M.S. "An algorithm for assembly of ordered restriction maps from single DNA molecules." RECOMB '98: Proceedings of the National Academy of Sciences of the United States of America 103 (2006): 15770–15775.
  10. Lai, Z., et al. "A Shotgun Optical Map of the Entire Plasmodium Falciparum Genome." Nature genetics 23.3 (1999): 309–13.
  11. Lim, A., et al. "Shotgun Optical Maps of the Whole Escherichia Coli O157:H7 Genome." Genome research 11.9 (2001): 1584-93.
  12. Lin, J., et al. "Whole-Genome Shotgun Optical Mapping of Deinococcus Radiodurans." Science 285.5433 (1999): 1558–62.
  13. Nagarajan, N., et al. "Scaffolding and validation of bacterial genome assemblies using optical restriction maps." Bioinformatics 24.10 (2008):1229–35.
  14. Church, D.M. et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology, 7.5 (2009):e1000112.
  15. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453 (2008): 56–64.
  16. Zhou, S. et al.Validation of rice genome sequence by Optical Mapping. BMC Genomics 8 (2007): 278.
  17. Zhou, S. et al. A single molecule scaffold for the maize genome. PLoS Genetics, 5.11(2009): epub.
  18. Ramanathan, A., et al. "An Integrative Approach for the Optical Sequencing of Single DNA Molecules." Analytical Biochemistry 330.2 (2004): 227–41.
  19. Ramanathan, A., Paper, L., and Schwartz, D.C. "High-Density Polymerase-Mediated Incorporation of Fluorochrome-Labeled Nucleotides." Analytical Biochemistry 337.1 (2005): 1–11.
  20. 1 2 Zhou, S., Paper, L., and Schwartz, D.C. "Optical Sequencing: Acquisition from Mapped Single-Molecule Templates." Next-Generation Genome Sequencing: Towards Personalized Medicine. Ed. Michal Janitz. 1st ed. Wiley-VCH, 2008. 133–151.