Library (biology)

Last updated
Site saturation mutagenesis is a type of site-directed mutagenesis. This image shows the saturation mutagenesis of a single position in a theoretical 10-residue protein. The wild type version of the protein is shown at the top, with M representing the first amino acid methionine, and * representing the termination of translation. All 19 mutants of the isoleucine at position 5 are shown below. Site saturation mutagenesis.svg
Site saturation mutagenesis is a type of site-directed mutagenesis. This image shows the saturation mutagenesis of a single position in a theoretical 10-residue protein. The wild type version of the protein is shown at the top, with M representing the first amino acid methionine, and * representing the termination of translation. All 19 mutants of the isoleucine at position 5 are shown below.
How DNA libraries generated by random mutagenesis sample sequence space. The amino acid substituted into a given position is shown. Each dot or set of connected dots is one member of the library. Error-prone PCR randomly mutates some residues to other amino acids. Alanine scanning replaces each residue of the protein with alanine, one-by-one. Site saturation substitutes each of the 20 possible amino acids (or some subset of them) at a single position, one-by-one. How random DNA libraries sample sequence space.pdf
How DNA libraries generated by random mutagenesis sample sequence space. The amino acid substituted into a given position is shown. Each dot or set of connected dots is one member of the library. Error-prone PCR randomly mutates some residues to other amino acids. Alanine scanning replaces each residue of the protein with alanine, one-by-one. Site saturation substitutes each of the 20 possible amino acids (or some subset of them) at a single position, one-by-one.

In molecular biology, a library is a collection of genetic material fragments that are stored and propagated in a population of microbes through the process of molecular cloning. There are different types of DNA libraries, including cDNA libraries (formed from reverse-transcribed RNA), genomic libraries (formed from genomic DNA) and randomized mutant libraries (formed by de novo gene synthesis where alternative nucleotides or codons are incorporated). DNA library technology is a mainstay of current molecular biology, genetic engineering, and protein engineering, and the applications of these libraries depend on the source of the original DNA fragments. There are differences in the cloning vectors and techniques used in library preparation, but in general each DNA fragment is uniquely inserted into a cloning vector and the pool of recombinant DNA molecules is then transferred into a population of bacteria (a Bacterial Artificial Chromosome or BAC library) or yeast such that each organism contains on average one construct (vector + insert). As the population of organisms is grown in culture, the DNA molecules contained within them are copied and propagated (thus, "cloned").

Contents

Terminology

The term "library" can refer to a population of organisms, each of which carries a DNA molecule inserted into a cloning vector, or alternatively to the collection of all of the cloned vector molecules.

cDNA libraries

A cDNA library represents a sample of the mRNA purified from a particular source (either a collection of cells, a particular tissue, or an entire organism), which has been converted back to a DNA template by the use of the enzyme reverse transcriptase. It thus represents the genes that were being actively transcribed in that particular source under the physiological, developmental, or environmental conditions that existed when the mRNA was purified. cDNA libraries can be generated using techniques that promote "full-length" clones or under conditions that generate shorter fragments used for the identification of "expressed sequence tags".

cDNA libraries are useful in reverse genetics, but they only represent a very small (less than 1%) portion of the overall genome in a given organism.

Applications of cDNA libraries include:

Genomic libraries

A genomic library is a set of clones that together represents the entire genome of a given organism. The number of clones that constitute a genomic library depends on (1) the size of the genome in question and (2) the insert size tolerated by the particular cloning vector system. For most practical purposes, the tissue source of the genomic DNA is unimportant because each cell of the body contains virtually identical DNA (with some exceptions).

Applications of genomic libraries include:

Synthetic mutant libraries

Depiction of one common way to clone a site-directed mutagenesis library (i.e., using degenerate oligos). The gene of interest is PCRed with oligos that contain a region that is perfectly complementary to the template (blue), and one that differs from the template by one or more nucleotides (red). Many such primers containing degeneracy in the non-complementary region are pooled into the same PCR, resulting in many different PCR products with different mutations in that region (individual mutants shown with different colors below). Site-directed mutagenesis library cloning steps.pdf
Depiction of one common way to clone a site-directed mutagenesis library (i.e., using degenerate oligos). The gene of interest is PCRed with oligos that contain a region that is perfectly complementary to the template (blue), and one that differs from the template by one or more nucleotides (red). Many such primers containing degeneracy in the non-complementary region are pooled into the same PCR, resulting in many different PCR products with different mutations in that region (individual mutants shown with different colors below).

In contrast to the library types described above, a variety of artificial methods exist for making libraries of variant genes. [1] Variation throughout the gene can be introduced randomly by either error-prone PCR, [2] DNA shuffling to recombine parts of similar genes together, [3] or transposon-based methods to introduce indels. [4] Alternatively, mutations can be targeted to specific codons during de novo synthesis or saturation mutagenesis to construct one or more point mutants of a gene in a controlled way. [5] This results in a mixture of double stranded DNA molecules which represent variants of the original gene.

The expressed proteins from these libraries can then be screened for variants which exhibit favorable properties (e.g. stability, binding affinity or enzyme activity). This can be repeated in cycles of creating gene variants and screening the expression products in a directed evolution process. [1]

Overview of cDNA library preparation techniques

DNA extraction

If creating an mRNA library (i.e. with cDNA clones), there are several possible protocols for isolating full length mRNA. To extract DNA for genomic DNA (also known as gDNA) libraries, a DNA mini-prep may be useful.

Insert preparation

cDNA libraries require care to ensure that full length clones of mRNA are captured as cDNA (which will later be inserted into vectors). Several protocols have been designed to optimise the synthesis of the 1st cDNA strand and the 2nd cDNA strand for this reason, and also to make directional cloning into the vector more likely.

gDNA fragments are generated from the extracted gDNA by using non-specific frequent cutter restriction enzymes.

Vectors

The nucleotide sequences of interest are preserved as inserts to a plasmid or the genome of a bacteriophage that has been used to infect bacterial cells.

Vectors are propagated most commonly in bacterial cells, but if using a YAC (Yeast Artificial Chromosome) then yeast cells may be used. Vectors could also be propagated in viruses, but this can be time-consuming and tedious. However, the high transfection efficiency achieved by using viruses (often phages) makes them useful for packaging the vector (with the ligated insert) and then introducing them into the bacterial (or yeast) cell.

Additionally, for cDNA libraries, a system using the Lambda Zap II phage, ExAssist, and 2 E. coli species has been developed. A Cre-Lox system using loxP sites and the in vivo expression of the recombinase enzyme can also be used instead. These are examples of in vivo excision systems. In vitro excision involves subcloning often using traditional restriction enzymes and cloning strategies. In vitro excision can be more time-consuming and may require more "hands-on" work than in vivo excision systems. In either case, the systems allow the movement of the vector from the phage into a live cell, where the vector can replicate and propagate until the library is to be used.

Using libraries

Workflow for screening a synthetic library to identify cells producing a chemical of interest. Applying synthetic biology tools to optimize production of chemicals by cells.svg
Workflow for screening a synthetic library to identify cells producing a chemical of interest.

This involves "screening" for the sequences of interest. There are multiple possible methods to achieve this.

Related Research Articles

A bacterial artificial chromosome (BAC) is a DNA construct, based on a functional fertility plasmid, used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell division. The bacterial artificial chromosome's usual insert size is 150–350 kbp. A similar cloning vector called a PAC has also been produced from the DNA of P1 bacteriophage.

<span class="mw-page-title-main">Cloning vector</span> Small piece of maintainable DNA

A cloning vector is a small piece of DNA that can be stably maintained in an organism, and into which a foreign DNA fragment can be inserted for cloning purposes. The cloning vector may be DNA taken from a virus, the cell of a higher organism, or it may be the plasmid of a bacterium. The vector contains features that allow for the convenient insertion of a DNA fragment into the vector or its removal from the vector, for example through the presence of restriction sites. The vector and the foreign DNA may be treated with a restriction enzyme that cuts the DNA, and DNA fragments thus generated contain either blunt ends or overhangs known as sticky ends, and vector DNA and foreign DNA with compatible ends can then be joined by molecular ligation. After a DNA fragment has been cloned into a cloning vector, it may be further subcloned into another vector designed for more specific use.

<span class="mw-page-title-main">Yeast artificial chromosome</span> Genetically engineered chromosome derived from the DNA of yeast

Yeast artificial chromosomes (YACs) are genetically engineered chromosomes derived from the DNA of the yeast, Saccharomyces cerevisiae, which is then ligated into a bacterial plasmid. By inserting large fragments of DNA, from 100–1000 kb, the inserted sequences can be cloned and physically mapped using a process called chromosome walking. This is the process that was initially used for the Human Genome Project, however due to stability issues, YACs were abandoned for the use of bacterial artificial chromosome

A cDNA library is a combination of cloned cDNA fragments inserted into a collection of host cells, which constitute some portion of the transcriptome of the organism and are stored as a "library". cDNA is produced from fully transcribed mRNA found in the nucleus and therefore contains only the expressed genes of an organism. Similarly, tissue-specific cDNA libraries can be produced. In eukaryotic cells the mature mRNA is already spliced, hence the cDNA produced lacks introns and can be readily expressed in a bacterial cell. While information in cDNA libraries is a powerful and useful tool since gene products are easily identified, the libraries lack information about enhancers, introns, and other regulatory elements found in a genomic DNA library.

Site-directed mutagenesis is a molecular biology method that is used to make specific and intentional mutating changes to the DNA sequence of a gene and any gene products. Also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, it is used for investigating the structure and biological activity of DNA, RNA, and protein molecules, and for protein engineering.

<span class="mw-page-title-main">Functional genomics</span> Field of molecular biology

Functional genomics is a field of molecular biology that attempts to describe gene functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.

<span class="mw-page-title-main">Recombinant DNA</span> DNA molecules formed by human agency at a molecular level generating novel DNA sequences

Recombinant DNA (rDNA) molecules are DNA molecules formed by laboratory methods of genetic recombination that bring together genetic material from multiple sources, creating sequences that would not otherwise be found in the genome.

A DNA construct is an artificially-designed segment of DNA borne on a vector that can be used to incorporate genetic material into a target tissue or cell. A DNA construct contains a DNA insert, called a transgene, delivered via a transformation vector which allows the insert sequence to be replicated and/or expressed in the target cell. This gene can be cloned from a naturally occurring gene, or synthetically constructed. The vector can be delivered using physical, chemical or viral methods. Typically, the vectors used in DNA constructs contain an origin of replication, a multiple cloning site, and a selectable marker. Certain vectors can carry additional regulatory elements based on the expression system involved.

A genomic library is a collection of overlapping DNA fragments that together make up the total genomic DNA of a single organism. The DNA is stored in a population of identical vectors, each containing a different insert of DNA. In order to construct a genomic library, the organism's DNA is extracted from cells and then digested with a restriction enzyme to cut the DNA into fragments of a specific size. The fragments are then inserted into the vector using DNA ligase. Next, the vector DNA can be taken up by a host organism - commonly a population of Escherichia coli or yeast - with each cell containing only one vector molecule. Using a host cell to carry the vector allows for easy amplification and retrieval of specific clones from the library for analysis.

Fosmids are similar to cosmids but are based on the bacterial F-plasmid. The cloning vector is limited, as a host can only contain one fosmid molecule. Fosmids can hold DNA inserts of up to 40 kb in size; often the source of the insert is random genomic DNA. A fosmid library is prepared by extracting the genomic DNA from the target organism and cloning it into the fosmid vector. The ligation mix is then packaged into phage particles and the DNA is transfected into the bacterial host. Bacterial clones propagate the fosmid library. The low copy number offers higher stability than vectors with relatively higher copy numbers, including cosmids. Fosmids may be useful for constructing stable libraries from complex genomes. Fosmids have high structural stability and have been found to maintain human DNA effectively even after 100 generations of bacterial growth. Fosmid clones were used to help assess the accuracy of the Public Human Genome Sequence.

In the fields of bioinformatics and computational biology, Genome survey sequences (GSS) are nucleotide sequences similar to expressed sequence tags (ESTs) that the only difference is that most of them are genomic in origin, rather than mRNA.

<span class="mw-page-title-main">Functional cloning</span>

Functional cloning is a molecular cloning technique that relies on prior knowledge of the encoded protein’s sequence or function for gene identification. In this assay, a genomic or cDNA library is screened to identify the genetic sequence of a protein of interest. Expression cDNA libraries may be screened with antibodies specific for the protein of interest or may rely on selection via the protein function. Historically, the amino acid sequence of a protein was used to prepare degenerate oligonucleotides which were then probed against the library to identify the gene encoding the protein of interest. Once candidate clones carrying the gene of interest are identified, they are sequenced and their identity is confirmed. This method of cloning allows researchers to screen entire genomes without prior knowledge of the location of the gene or the genetic sequence.

In molecular cloning, a vector is any particle used as a vehicle to artificially carry a foreign nucleic sequence – usually DNA – into another cell, where it can be replicated and/or expressed. A vector containing foreign DNA is termed recombinant DNA. The four major types of vectors are plasmids, viral vectors, cosmids, and artificial chromosomes. Of these, the most commonly used vectors are plasmids. Common to all engineered vectors are an origin of replication, a multicloning site, and a selectable marker.

Transposons are semi-parasitic DNA sequences which can replicate and spread through the host's genome. They can be harnessed as a genetic tool for analysis of gene and protein function. The use of transposons is well-developed in Drosophila and in Thale cress and bacteria such as Escherichia coli.

<span class="mw-page-title-main">Molecular cloning</span> Set of methods in molecular biology

Molecular cloning is a set of experimental methods in molecular biology that are used to assemble recombinant DNA molecules and to direct their replication within host organisms. The use of the word cloning refers to the fact that the method involves the replication of one molecule to produce a population of cells with identical DNA molecules. Molecular cloning generally uses DNA sequences from two different organisms: the species that is the source of the DNA to be cloned, and the species that will serve as the living host for replication of the recombinant DNA. Molecular cloning methods are central to many contemporary areas of modern biology and medicine.

<span class="mw-page-title-main">Genetic engineering techniques</span> Methods used to change the DNA of organisms

Genetic engineering techniques allow the modification of animal and plant genomes. Techniques have been devised to insert, delete, and modify DNA at multiple levels, ranging from a specific base pair in a specific gene to entire genes. There are a number of steps that are followed before a genetically modified organism (GMO) is created. Genetic engineers must first choose what gene they wish to insert, modify, or delete. The gene must then be isolated and incorporated, along with other genetic elements, into a suitable vector. This vector is then used to insert the gene into the host genome, creating a transgenic or edited organism.

<span class="mw-page-title-main">In vitro recombination</span> Process of isolation and amplification of DNA segments

Recombinant DNA (rDNA), or molecular cloning, is the process by which a single gene, or segment of DNA, is isolated and amplified. Recombinant DNA is also known as in vitro recombination. A cloning vector is a DNA molecule that carries foreign DNA into a host cell, where it replicates, producing many copies of itself along with the foreign DNA. There are many types of cloning vectors such as plasmids and phages. In order to carry out recombination between vector and the foreign DNA, it is necessary the vector and DNA to be cloned by digestion, ligase the foreign DNA into the vector with the enzyme DNA ligase. And DNA is inserted by introducing the DNA into bacteria cells by transformation.

<span class="mw-page-title-main">Mutagenesis (molecular biology technique)</span>

In molecular biology, mutagenesis is an important laboratory technique whereby DNA mutations are deliberately engineered to produce libraries of mutant genes, proteins, strains of bacteria, or other genetically modified organisms. The various constituents of a gene, as well as its regulatory elements and its gene products, may be mutated so that the functioning of a genetic locus, process, or product can be examined in detail. The mutation may produce mutant proteins with interesting properties or enhanced or novel functions that may be of commercial use. Mutant strains may also be produced that have practical application or allow the molecular basis of a particular cell function to be investigated.

This glossary of cellular and molecular biology is a list of definitions of terms and concepts commonly used in the study of cell biology, molecular biology, and related disciplines, including molecular genetics, biochemistry, and microbiology. It is split across two articles:

This glossary of cellular and molecular biology is a list of definitions of terms and concepts commonly used in the study of cell biology, molecular biology, and related disciplines, including genetics, biochemistry, and microbiology. It is split across two articles:

References

  1. 1 2 Wajapeyee, Narendra; Liu, Alex Y.; Forloni, Matteo (2018-03-01). "Random Mutagenesis Using Error-Prone DNA Polymerases". Cold Spring Harbor Protocols. 2018 (3): pdb.prot097741. doi:10.1101/pdb.prot097741. ISSN   1940-3402. PMID   29496818.
  2. McCullum, Elizabeth O.; Williams, Berea A. R.; Zhang, Jinglei; Chaput, John C. (2010), Braman, Jeff (ed.), "Random Mutagenesis by Error-Prone PCR", In Vitro Mutagenesis Protocols: Third Edition, Methods in Molecular Biology, vol. 634, Humana Press, pp. 103–109, doi:10.1007/978-1-60761-652-8_7, ISBN   9781607616528, PMID   20676978
  3. Crameri A, Raillard SA, Bermudez E, Stemmer WP (January 1998). "DNA shuffling of a family of genes from diverse species accelerates directed evolution". Nature. 391 (6664): 288–91. Bibcode:1998Natur.391..288C. doi:10.1038/34663. PMID   9440693. S2CID   4352696.
  4. Jones DD (May 2005). "Triplet nucleotide removal at random positions in a target gene: the tolerance of TEM-1 beta-lactamase to an amino acid deletion". Nucleic Acids Research. 33 (9): e80. doi:10.1093/nar/gni077. PMC   1129029 . PMID   15897323.
  5. Wang, Tian-Wen; Zhu, Hu; Ma, Xing-Yuan; Zhang, Ting; Ma, Yu-Shu; Wei, Dong-Zhi (2006-09-01). "Mutant library construction in directed molecular evolution". Molecular Biotechnology. 34 (1): 55–68. doi:10.1385/MB:34:1:55. ISSN   1559-0305. PMID   16943572. S2CID   44393645.