Overlapping gene

Last updated

An overlapping gene (or OLG) [1] [2] is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene. [3] In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viral genomes. [2] The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses. [2] In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.

Contents

Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate reading frame from another gene at the same locus. [4] The alternative open reading frames (ORF) are thought to be created by critical nucleotide substitutions within an expressible pre-existing gene, which can be induced to express a novel protein while still preserving the function of the original gene. [5] Overprinting has been hypothesized as a mechanism for de novo emergence of new genes from existing sequences, either older genes or previously non-coding regions of the genome. [6] It is believed that most overlapping genes, or genes whose expressible nucleotide sequences partially overlap with each other, evolved in part due to this mechanism, suggesting that each overlap is composed of one ancestral gene and one novel gene. [7] Subsequently, overprinting is also believed to be a source of novel proteins, as de novo proteins coded by these novel genes usually lack remote homologs in databases. [8] Overprinted genes are particularly common features of the genomic organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information. [9] It is likely that overprinting is responsible for the generation of numerous novel proteins by viruses over the course of their evolutionary history.

Classification

Tandem out-of-phase overlap of the human mitochondrial genes ATP8 (+1 frame, in red) and ATP6 (+3 frame, in blue) Homo sapiens-mtDNA~NC 012920-ATP8+ATP6 Overlap.svg
Tandem out-of-phase overlap of the human mitochondrial genes ATP8 (+1 frame, in red) and ATP6 (+3 frame, in blue)

Genes may overlap in a variety of ways and can be classified by their positions relative to each other. [3] [11] [12] [13] [14]

Overlapping genes can also be classified by phases, which describe their relative reading frames: [3] [11] [12] [13] [14]

Studies on overlapping genes suggest that their evolution can be summarized in two possible models. [4] In one model, the two proteins encoded by their respective overlapping genes evolve under similar selection pressures. The proteins and the overlap region are highly conserved when strong selection against amino acid change is favored. Overlapping genes are reasoned to evolve under strict constraints as a single nucleotide substitution is able to alter the structure and function of the two proteins simultaneously. A study on the hepatitis B virus (HBV), whose DNA genome contains numerous overlapping genes, showed the mean number of synonymous nucleotide substitutions per site in overlapping coding regions was significantly lower than that of non-overlapping regions. [15] The same study showed that it was possible for some of these overlapping regions and their proteins to diverge significantly from the original when there's weak selection against amino acid change. The spacer domain of the polymerase and the pre-S1 region of a surface protein of HBV, for example, had a percentage of conserved amino acids of 30% and 40%, respectively. [15] However, these overlap regions are known to be less important for replication compared to the overlap regions that were highly conserved among different HBV strains, which are absolutely essential for the process.

The second model suggests that the two proteins and their respective overlap genes evolve under opposite selection pressures: one frame experiences positive selection while the other is under purifying selection. In tombusviruses, the proteins p19 and p22 are encoded by overlapping genes that form a 549 nt coding region, and p19 is shown to be under positive selection while p22 is under purifying selection. [16] Additional examples are mentioned in studies involving overlapping genes of the Sendai virus, [17] potato leafroll virus, [18] and human parvovirus B19. [19] This phenomenon of overlapping genes experiencing different selection pressures is suggested to be a consequence of a high rate of nucleotide substitution with different effects on the two frames; the substitutions may be majorly non-synonymous for one frame while mostly being synonymous for the other frame. [4]

Evolution

Overlapping genes are particularly common in rapidly evolving genomes, such as those of viruses, bacteria, and mitochondria. They may originate in three ways: [20]

  1. By extension of an existing open reading frame (ORF) downstream into a contiguous gene due to the loss of a stop codon;
  2. By extension of an existing ORF upstream into a contiguous gene due to loss of an initiation codon;
  3. By generation of a novel ORF within an existing one due to a point mutation.

The use of the same nucleotide sequence to encode multiple genes may provide evolutionary advantage due to reduction in genome size and due to the opportunity for transcriptional and translational co-regulation of the overlapping genes. [12] [21] [22] [23] Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions. [14] [24]

Origins of new genes

A cladogram indicating the likely evolutionary trajectory of the gene-dense pX region in human T-lymphotropic virus 1 (HTLV1), a deltaretrovirus associated with blood cancers. This region contains numerous overlapping genes, several of which likely originated de novo through overprinting. Pavesi ploscompbio 2013 fig4.png
A cladogram indicating the likely evolutionary trajectory of the gene-dense pX region in human T-lymphotropic virus 1 (HTLV1), a deltaretrovirus associated with blood cancers. This region contains numerous overlapping genes, several of which likely originated de novo through overprinting.

In 1977, Pierre-Paul Grassé proposed that one of the genes in the pair could have originated de novo by mutations to introduce novel ORFs in alternate reading frames; he described the mechanism as overprinting. [25] :231 It was later substantiated by Susumu Ohno, who identified a candidate gene that may have arisen by this mechanism. [26] Some de novo genes originating in this way may not remain overlapping, but subfunctionalize following gene duplication, [6] contributing to the prevalence of orphan genes. Which member of an overlapping gene pair is younger can be identified bioinformatically either by a more restricted phylogenetic distribution, or by less optimized codon usage. [9] [27] [28] Younger members of the pair tend to have higher intrinsic structural disorder than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap. [27] Overlaps are more likely to originate in proteins that already have high disorder. [27]

Taxonomic distribution

Overlapping genes in the bacteriophage PhX174 genome. There are 11 genes in this genome (A, A*, B-H, J, K). Genes B, K, E overlap with genes A, C, D. Genome map of the bacteriophage PhX174 showing overlapping genes.svg
Overlapping genes in the bacteriophage ΦX174 genome. There are 11 genes in this genome (A, A*, B-H, J, K). Genes B, K, E overlap with genes A, C, D.

Overlapping genes occur in all domains of life, though with varying frequencies. They are especially common in viral genomes.

Viruses

The RNA silencing suppressor p19 from tomato bushy stunt virus, a protein encoded by an overprinted gene. The protein specifically binds siRNAs produced as part of the plant's RNA silencing defense against viruses. 1R9F tombusvirus p19.png
The RNA silencing suppressor p19 from tomato bushy stunt virus, a protein encoded by an overprinted gene. The protein specifically binds siRNAs produced as part of the plant's RNA silencing defense against viruses.

The existence of overlapping genes was first identified in the virus ΦX174, whose genome was the first DNA genome ever sequenced by Frederick Sanger in 1977. [29] Previous analysis of ΦX174, a small single-stranded DNA bacteriophage that infected the bacteria Escherichia coli, suggested that the proteins produced during infection required coding sequences longer than the measured length of its genome. [31] Analysis of the fully sequenced 5386 nucleotide genome showed that the virus possessed extensive overlap between coding regions, revealing that some genes (like genes D and E) were translated from the same DNA sequences but in different reading frames. [29] [31] An alternative start site within the genome replication gene A of ΦX174 was shown to express a truncated protein with an identical coding sequence to the C-terminus of the original A protein but possessing a different function [32] [33] It was concluded that other undiscovered sites of polypeptide synthesis could be hidden through the genome due to overlapping genes. An identified de novo gene of another overlapping gene locus was shown to express a novel protein that induces lysis of E. coli by inhibiting biosynthesis of its cell wall[56], suggesting that de novo protein creation through the process of overprinting can be a significant factor in the evolution of pathogenicity of viruses. [4] Another example is the ORF3d gene in the SARS-CoV 2 virus. [1] [34] Overlapping genes are particularly common in viral genomes. [9] Some studies attribute this observation to selective pressure toward small genome sizes mediated by the physical constraints of packaging the genome in a viral capsid, particularly one of icosahedral geometry. [35] However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. [36] Overprinting is a common source of de novo genes in viruses. [28]

The proportion of viruses with overlapping coding sequences within their genomes varies. [2] Double-stranded RNA viruses have fewer than a quarter that contains them while almost three-quarters of retroviridae and viruses with single-stranded DNA genomes contain overlapping coding sequences. [37] Segmented viruses in particular, or viruses with their genome split into separate pieces and packaged either all in the same capsid or in separate capsids, are more likely to contain an overlapping sequence than non-segmented viruses. [37] RNA viruses have fewer overlapping genes than DNA viruses which possess lower mutation rates and less restrictive genome sizes. [37] [38] The lower mutation rate of DNA viruses facilitates greater genomic novelty and evolutionary exploration within a structurally constrained genome and may be the primary driver of the evolution of overlapping genes. [39] [40]

Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not essential to viral proliferation, but contribute to pathogenicity. Overprinted proteins often have unusual amino acid distributions and high levels of intrinsic disorder. [41] In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures; [42] one example is the RNA silencing suppressor p19 found in Tombusviruses, which has both a novel protein fold and a novel binding mode in recognizing siRNAs. [28] [30] [43]

Prokaryotes

Estimates of gene overlap in bacterial genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. [12] [44] [45] Most studies of overlap in bacterial genomes find evidence that overlap serves a function in gene regulation, permitting the overlapped genes to be transcriptionally and translationally co-regulated. [12] [23] In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation. [12] [14] [11] Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2. [45] [46] Long overlaps of greater than 60 base pairs are more common for convergent genes; however, putative long overlaps have very high rates of misannotation. [47] Robustly validated examples of long overlaps in bacterial genomes are rare; in the well-studied model organism Escherichia coli , only four gene pairs are well validated as having long, overprinted overlaps. [48]

Eukaryotes

Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. [28] However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans. [49] [50] [51] [52] Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. [50] Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved. [51] [53] Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated de novo in a given eukaryotic lineage. [51] [54] [55]

Function

The precise functions of overlapping genes seems to vary across the domains of life but several experiments have shown that they are important for virus lifecycles through proper protein expression and stoichiometry [56] as well as playing a role in proper protein folding. [57] A version of bacteriophage ΦX174 has also been created where all gene overlaps were removed [58] proving they were not necessary for replication.

The retention and evolution of overlapping genes within viruses may also be due to capsid size limitations. [59] Dramatic viability loss was observed in viruses with genomes engineered to be longer than the wild-type genome. [60] Increasing the single-stranded DNA genome length of ΦX174 by >1% results in almost complete loss of infectivity, believed to be the result of the strict physical constraints imposed by the finite capsid volume. [61] Studies on adeno-associated viruses as gene delivery vectors showed that viral packaging is constrained by genetic cargo size limits, requiring the use of multiple vectors to deliver large human genes such as CFTR81. [62] [63] Therefore, it is suggested that overlapping genes evolved as a means to overcome these physical constraints, increasing genetic diversity by utilizing only the existing sequence rather than increasing genome length.

Methods in identifying overlapping genes and ORFs

Standardized methods such as genome annotation may be inappropriate for the detection of overlapping genes as they are reliant on already curated genes while overlapping genes are generally overlooked contain atypical sequence composition. [2] [64] [65] [66] Genome annotation standards are also often biased against feature overlaps, such as genes entirely contained within another gene. [67] Furthermore, some bioinformatics pipelines such as the RAST pipeline markedly penalizes overlaps between predicted ORFs. [68] However, rapid advancement of genome-scale protein and RNA measurement tools along with increasingly advanced prediction algorithms have revealed an avalanche of overlapping genes and ORFs within numerous genomes. [2] Proteogenomic methods have been essential in discovering numerous overlapping genes and include a combination of techniques such as bottom-up proteomics, ribosome profiling, DNA sequencing, and perturbation. RNA sequencing is also used to identify genomic regions containing overlapping transcripts. It has been utilized to identify 180,000 alternate ORFs within previously annotated coding regions found in humans. [69] Newly discovered ORFs such as these are verified using a variety of reverse genetics techniques, such as CRISPR-Cas9 and catalytically dead Cas9 (dCas9) disruption. [70] [71] [72] Attempts at proof-by-synthesis are also performed to show beyond doubt the absence of any undiscovered overlapping genes. [73]

See also

Related Research Articles

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, the evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.

<i>Paramyxoviridae</i> Family of viruses

Paramyxoviridae is a family of negative-strand RNA viruses in the order Mononegavirales. Vertebrates serve as natural hosts. Diseases associated with this family include measles, mumps, and respiratory tract infections. The family has four subfamilies, 17 genera, and 78 species, three genera of which are unassigned to a subfamily.

The coding region of a gene, also known as the coding sequence(CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy.

<span class="mw-page-title-main">SARS-related coronavirus</span> Species of coronavirus causing SARS and COVID-19

Severe acute respiratory syndrome–related coronavirus is a species of virus consisting of many known strains phylogenetically related to severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) that have been shown to possess the capability to infect humans, bats, and certain other mammals. These enveloped, positive-sense single-stranded RNA viruses enter host cells by binding to the angiotensin-converting enzyme 2 (ACE2) receptor. The SARSr-CoV species is a member of the genus Betacoronavirus and of the subgenus Sarbecovirus.

<span class="mw-page-title-main">Reading frame</span>

In molecular biology, a reading frame is a way of dividing the sequence of nucleotides in a nucleic acid molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or stop signals during translation, they are called codons.

Cauliflower mosaic virus (CaMV) is a member of the genus Caulimovirus, one of the six genera in the family Caulimoviridae, which are pararetroviruses that infect plants. Pararetroviruses replicate through reverse transcription just like retroviruses, but the viral particles contain DNA instead of RNA.

In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames will be "open". Such an ORF may contain a start codon and by definition cannot extend beyond a stop codon. That start codon indicates where translation may start. The transcription termination site is located after the ORF, beyond the translation stop codon. If transcription were to cease before the stop codon, an incomplete protein would be made during translation.

The 5′ untranslated region is the region of a messenger RNA (mRNA) that is directly upstream from the initiation codon. This region is important for the regulation of translation of a transcript by differing mechanisms in viruses, prokaryotes and eukaryotes. While called untranslated, the 5′ UTR or a portion of it is sometimes translated into a protein product. This product can then regulate the translation of the main coding sequence of the mRNA. In many organisms, however, the 5′ UTR is completely untranslated, instead forming a complex secondary structure to regulate translation.

<span class="mw-page-title-main">Phi X 174</span> A single-stranded DNA virus that infects bacteria

The phi X 174 bacteriophage is a single-stranded DNA (ssDNA) virus that infects Escherichia coli, and the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers and Robert Sinsheimer had already demonstrated the physical, covalently closed circularity of ΦX174 DNA. Nobel prize winner Arthur Kornberg used ΦX174 as a model to first prove that DNA synthesized in a test tube by purified enzymes could produce all the features of a natural virus, ushering in the age of synthetic biology. In 1972–1974, Jerard Hurwitz, Sue Wickner, and Reed Wickner with collaborators identified the genes required to produce the enzymes to catalyze conversion of the single stranded form of the virus to the double stranded replicative form. In 2003, it was reported by Craig Venter's group that the genome of ΦX174 was the first to be completely assembled in vitro from synthesized oligonucleotides. The ΦX174 virus particle has also been successfully assembled in vitro. In 2012, it was shown how its highly overlapping genome can be fully decompressed and still remain functional.

<span class="mw-page-title-main">Gene</span> Sequence of DNA or RNA that codes for an RNA or protein product

In biology, the word gene can have several different meanings. The Mendelian gene is a basic unit of heredity and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

In molecular biology and genetics, the sense of a nucleic acid molecule, particularly of a strand of DNA or RNA, refers to the nature of the roles of the strand and its complement in specifying a sequence of amino acids. Depending on the context, sense may have slightly different meanings. For example, negative-sense strand of DNA is equivalent to the template strand, whereas the positive-sense strand is the non-template strand whose nucleotide sequence is equivalent to the sequence of the mRNA transcript.

Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron structure. There are different mechanisms through which exon shuffling occurs: transposon mediated exon shuffling, crossover during sexual recombination of parental genomes and illegitimate recombination.

<span class="mw-page-title-main">Bacteriophage MS2</span> Species of virus

Bacteriophage MS2, commonly called MS2, is an icosahedral, positive-sense single-stranded RNA virus that infects the bacterium Escherichia coli and other members of the Enterobacteriaceae. MS2 is a member of a family of closely related bacterial viruses that includes bacteriophage f2, bacteriophage Qβ, R17, and GA.

Orphan genes, ORFans, or taxonomically restricted genes (TRGs) are genes that lack a detectable homologue outside of a given species or lineage. Most genes have known homologues. Two genes are homologous when they share an evolutionary history, and the study of groups of homologous genes allows for an understanding of their evolutionary history and divergence. Common mechanisms that have been uncovered as sources for new genes through studies of homologues include gene duplication, exon shuffling, gene fusion and fission, etc. Studying the origins of a gene becomes more difficult when there is no evident homologue. The discovery that about 10% or more of the genes of the average microbial species is constituted by orphan genes raises questions about the evolutionary origins of different species as well as how to study and uncover the evolutionary origins of orphan genes.

<span class="mw-page-title-main">Genome evolution</span> Process by which a genome changes in structure or size over time

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

<span class="mw-page-title-main">RNA silencing suppressor p19</span> Viral protein

RNA silencing suppressor p19 is a protein expressed from the ORF4 gene in the genome of tombusviruses. These viruses are positive-sense single-stranded RNA viruses that infect plant cells, in which RNA silencing forms a widespread and robust antiviral defense system. The p19 protein serves as a counter-defense strategy, specifically binding the 19- to 21-nucleotide double-stranded RNAs that function as small interfering RNA (siRNA) in the RNA silencing system. By sequestering siRNA, p19 suppresses RNA silencing and promotes viral proliferation. The p19 protein is considered a significant virulence factor and a component of an evolutionary arms race between plants and their pathogens.

The split gene theory is a theory of the origin of introns, long non-coding sequences in eukaryotic genes between the exons. The theory holds that the randomness of primordial DNA sequences would only permit small (< 600bp) open reading frames (ORFs), and that important intron structures and regulatory sequences are derived from stop codons. In this introns-first framework, the spliceosomal machinery and the nucleus evolved due to the necessity to join these ORFs into larger proteins, and that intronless bacterial genes are less ancestral than the split eukaryotic genes. The theory originated with Periannan Senapathy.

<i>De novo</i> gene birth Evolution of novel genes from non-genic DNA sequence

De novo gene birth is the process by which new genes evolve from DNA sequences that were ancestrally non-genic. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes. The processes that govern de novo gene birth are not well understood, although several models exist that describe possible mechanisms by which de novo gene birth may occur.

ORF3c is a gene found in coronaviruses of the subgenus Sarbecovirus, including SARS-CoV and SARS-CoV-2. It was first identified in the SARS-CoV-2 genome and encodes a 41 amino acid non-structural protein of unknown function. It is also present in the SARS-CoV genome, but was not recognized until the identification of the SARS-CoV-2 homolog.

ORF1ab refers collectively to two open reading frames (ORFs), ORF1a and ORF1b, that are conserved in the genomes of nidoviruses, a group of viruses that includes coronaviruses. The genes express large polyproteins that undergo proteolysis to form several nonstructural proteins with various functions in the viral life cycle, including proteases and the components of the replicase-transcriptase complex (RTC). Together the two ORFs are sometimes referred to as the replicase gene. They are related by a programmed ribosomal frameshift that allows the ribosome to continue translating past the stop codon at the end of ORF1a, in a -1 reading frame. The resulting polyproteins are known as pp1a and pp1ab.

References

  1. 1 2 Nelson, Chase W; et al. (1 October 2020). "Dynamically evolving novel overlapping gene as a factor in the SARS-CoV-2 pandemic". eLife . 9. doi: 10.7554/eLife.59633 . PMC   7655111 . PMID   33001029.
  2. 1 2 3 4 5 6 Wright, Bradley W.; Molloy, Mark P.; Jaschke, Paul R. (5 October 2021). "Overlapping genes in natural and engineered genomes". Nature Reviews Genetics. 23 (3): 154–168. doi:10.1038/s41576-021-00417-w. ISSN   1471-0064. PMC   8490965 . PMID   34611352.
  3. 1 2 3 Y. Fukuda, M. Tomita et T. Washio (1999). "Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae". Nucleic Acids Res. 27 (8): 1847–1853. doi:10.1093/nar/27.8.1847. PMC   148392 . PMID   10101192.
  4. 1 2 3 4 Pavesi, Angelo (26 May 2021). "Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review". Genes. 12 (6): 809. doi: 10.3390/genes12060809 . ISSN   2073-4425. PMC   8227390 . PMID   34073395.
  5. Normark, Staffan; Bergström, Sven; Edlund, Thomas; Grundström, Thomas; Jaurin, Bengtake; Lindberg, Frederik P.; Olsson, Olof (December 1983). "Overlapping Genes". Annual Review of Genetics. 17 (1): 499–525. doi:10.1146/annurev.ge.17.120183.002435. ISSN   0066-4197. PMID   6198955.
  6. 1 2 Keese, PK; Gibbs, A (15 October 1992). "Origins of genes: "big bang" or continuous creation?". Proceedings of the National Academy of Sciences of the United States of America. 89 (20): 9489–93. Bibcode:1992PNAS...89.9489K. doi: 10.1073/pnas.89.20.9489 . PMC   50157 . PMID   1329098.
  7. Keese, P. K.; Gibbs, A. (15 October 1992). "Origins of genes: "big bang" or continuous creation?". Proceedings of the National Academy of Sciences. 89 (20): 9489–9493. Bibcode:1992PNAS...89.9489K. doi: 10.1073/pnas.89.20.9489 . ISSN   0027-8424. PMC   50157 . PMID   1329098.
  8. Gibbs, Adrian; Keese, Paul K. (19 October 1995), "In search of the origins of viral genes", Molecular Basis of Virus Evolution, Cambridge University Press, pp. 76–90, doi:10.1017/cbo9780511661686.008, ISBN   9780521455336 , retrieved 3 December 2021
  9. 1 2 3 4 Pavesi, Angelo; Magiorkinis, Gkikas; Karlin, David G.; Wilke, Claus O. (15 August 2013). "Viral Proteins Originated De Novo by Overprinting Can Be Identified by Codon Usage: Application to the "Gene Nursery" of Deltaretroviruses". PLOS Computational Biology. 9 (8): e1003162. Bibcode:2013PLSCB...9E3162P. doi: 10.1371/journal.pcbi.1003162 . PMC   3744397 . PMID   23966842.
  10. Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJ, Staden R, Young IG (April 1981). "Sequence and organization of the human mitochondrial genome". Nature. 290 (5806): 457–465. Bibcode:1981Natur.290..457A. doi:10.1038/290457a0. PMID   7219534. S2CID   4355527.
  11. 1 2 3 Fukuda, Yoko; Nakayama, Yoichi; Tomita, Masaru (December 2003). "On dynamics of overlapping genes in bacterial genomes". Gene. 323: 181–187. doi:10.1016/j.gene.2003.09.021. PMID   14659892.
  12. 1 2 3 4 5 6 Johnson Z, Chisholm S (2004). "Properties of overlapping genes are conserved across microbial genomes". Genome Res. 14 (11): 2268–72. doi:10.1101/gr.2433104. PMC   525685 . PMID   15520290.
  13. 1 2 Normark S.; Bergstrom S.; Edlund T.; Grundstrom T.; Jaurin B.; Lindberg F.P.; Olsson O. (1983). "Overlapping genes". Annual Review of Genetics. 17: 499–525. doi:10.1146/annurev.ge.17.120183.002435. PMID   6198955.
  14. 1 2 3 4 Rogozin, Igor B.; Spiridonov, Alexey N.; Sorokin, Alexander V.; Wolf, Yuri I.; Jordan, I.King; Tatusov, Roman L.; Koonin, Eugene V. (May 2002). "Purifying and directional selection in overlapping prokaryotic genes". Trends in Genetics. 18 (5): 228–232. doi:10.1016/S0168-9525(02)02649-5. PMID   12047938.
  15. 1 2 Mizokami, Masashi; Orito, Etsuro; Ohba, Ken-ichi; Ikeo, Kazuho; Lau, Johnson Y. N.; Gojobori, Takashi (January 1997). "Constrained evolution with respect to gene overlap of hepatitis B virus". Journal of Molecular Evolution. 44 (S1): S83–S90. Bibcode:1997JMolE..44S..83M. doi:10.1007/pl00000061. ISSN   0022-2844. PMID   9071016. S2CID   22644652.
  16. Allison, Jane R.; Lechner, Marcus; Hoeppner, Marc P.; Poole, Anthony M. (12 February 2016). "Positive Selection or Free to Vary? Assessing the Functional Significance of Sequence Change Using Molecular Dynamics". PLOS ONE. 11 (2): e0147619. Bibcode:2016PLoSO..1147619A. doi: 10.1371/journal.pone.0147619 . ISSN   1932-6203. PMC   4752228 . PMID   26871901.
  17. Fujii, Yutaka; Kiyotani, Katsuhiro; Yoshida, Tetsuya; Sakaguchi, Takemasa (2001). "Conserved and non-conserved regions in the Sendai virus genome: Evolution of a gene possessing overlapping reading frames". Virus Genes. 22 (1): 47–52. doi:10.1023/a:1008130318633. ISSN   0920-8569. PMID   11210938. S2CID   12869504.
  18. Guyader, Sébastien; Ducray, Danièle Giblot (1 July 2002). "Sequence analysis of Potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products". Journal of General Virology. 83 (7): 1799–1807. doi: 10.1099/0022-1317-83-7-1799 . ISSN   0022-1317. PMID   12075102.
  19. Stamenković, Gorana G.; Ćirković, Valentina S.; Šiljić, Marina M.; Blagojević, Jelena V.; Knežević, Aleksandra M.; Joksić, Ivana D.; Stanojević, Maja P. (24 October 2016). "Substitution rate and natural selection in parvovirus B19". Scientific Reports. 6 (1): 35759. Bibcode:2016NatSR...635759S. doi:10.1038/srep35759. ISSN   2045-2322. PMC   5075947 . PMID   27775080.
  20. Krakauer, David C. (June 2000). "Stability and Evolution of Overlapping Genes". Evolution. 54 (3): 731–739. doi: 10.1111/j.0014-3820.2000.tb00075.x . PMID   10937248. S2CID   8818055.
  21. Delaye, Luis; DeLuna, Alexander; Lazcano, Antonio; Becerra, Arturo (2008). "The origin of a novel gene through overprinting in Escherichia coli". BMC Evolutionary Biology. 8 (1): 31. doi: 10.1186/1471-2148-8-31 . PMC   2268670 . PMID   18226237.
  22. Saha, Deeya; Podder, Soumita; Panda, Arup; Ghosh, Tapash Chandra (May 2016). "Overlapping genes: A significant genomic correlate of prokaryotic growth rates". Gene. 582 (2): 143–147. doi:10.1016/j.gene.2016.02.002. PMID   26853049.
  23. 1 2 Luo, Yingqin; Battistuzzi, Fabia; Lin, Kui; Gibas, Cynthia (29 November 2013). "Evolutionary Dynamics of Overlapped Genes in Salmonella". PLOS ONE. 8 (11): e81016. Bibcode:2013PLoSO...881016L. doi: 10.1371/journal.pone.0081016 . PMC   3843671 . PMID   24312259.
  24. Wei, X.; Zhang, J. (31 December 2014). "A Simple Method for Estimating the Strength of Natural Selection on Overlapping Genes". Genome Biology and Evolution. 7 (1): 381–390. doi:10.1093/gbe/evu294. PMC   4316641 . PMID   25552532.
  25. Grassé, Pierre-Paul (1977). Evolution of Living Organisms: Evidence for a New Theory of Transformation. Academic Press. ISBN   9781483274096.
  26. Ohno, S (April 1984). "Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence". Proceedings of the National Academy of Sciences of the United States of America. 81 (8): 2421–5. Bibcode:1984PNAS...81.2421O. doi: 10.1073/pnas.81.8.2421 . PMC   345072 . PMID   6585807.
  27. 1 2 3 Willis, Sara; Masel, Joanna (19 July 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. PMC   6116962 . PMID   30026186.
  28. 1 2 3 4 Sabath, N.; Wagner, A.; Karlin, D. (19 July 2012). "Evolution of Viral Proteins Originated De Novo by Overprinting". Molecular Biology and Evolution. 29 (12): 3767–3780. doi:10.1093/molbev/mss179. PMC   3494269 . PMID   22821011.
  29. 1 2 3 Sanger, F.; Air, G. M.; Barrell, B. G.; Brown, N. L.; Coulson, A. R.; Fiddes, J. C.; Hutchison, C. A.; Slocombe, P. M.; Smith, M. (1977). "Nucleotide sequence of bacteriophage ΦX174 DNA". Nature. 265 (5596): 687–95. Bibcode:1977Natur.265..687S. doi:10.1038/265687a0. PMID   870828. S2CID   4206886.
  30. 1 2 Ye, Keqiong; Malinina, Lucy; Patel, Dinshaw J. (3 December 2003). "Recognition of small interfering RNA by a viral suppressor of RNA silencing". Nature. 426 (6968): 874–878. Bibcode:2003Natur.426..874Y. doi:10.1038/nature02213. PMC   4694583 . PMID   14661029.
  31. 1 2 Barrell, B. G.; Air, G. M.; Hutchison, C. A. (November 1976). "Overlapping genes in bacteriophage φX174". Nature. 264 (5581): 34–41. Bibcode:1976Natur.264...34B. doi:10.1038/264034a0. ISSN   1476-4687. PMID   1004533. S2CID   4264796.
  32. LINNEY, ELWOOD; HAYASHI, MASAKI (May 1974). "Intragenic regulation of the synthesis of ΦX174 gene A proteins". Nature. 249 (5455): 345–348. Bibcode:1974Natur.249..345L. doi:10.1038/249345a0. ISSN   0028-0836. PMID   4601823. S2CID   4175651.
  33. Roznowski, Aaron P.; Doore, Sarah M.; Kemp, Sundance Z.; Fane, Bentley A. (6 January 2020). "Finally, a Role Befitting A star : Strongly Conserved, Unessential Microvirus A* Proteins Ensure the Product Fidelity of Packaging Reactions". Journal of Virology. 94 (2). doi:10.1128/jvi.01593-19. ISSN   0022-538X. PMC   6955274 . PMID   31666371.
  34. Dockrill, Peter (11 November 2020). "Scientists Just Found a Mysteriously Hidden 'Gene Within a Gene' in SARS-CoV-2". ScienceAlert . Retrieved 11 November 2020.
  35. Chirico, N.; Vianelli, A.; Belshaw, R. (7 July 2010). "Why genes overlap in viruses". Proceedings of the Royal Society B: Biological Sciences. 277 (1701): 3809–3817. doi:10.1098/rspb.2010.1052. PMC   2992710 . PMID   20610432.
  36. Brandes, Nadav; Linial, Michal (21 May 2016). "Gene overlapping and size constraints in the viral world". Biology Direct. 11 (1): 26. doi: 10.1186/s13062-016-0128-3 . PMC   4875738 . PMID   27209091.
  37. 1 2 3 Schlub, Timothy E; Holmes, Edward C (1 January 2020). "Properties and abundance of overlapping genes in viruses". Virus Evolution. 6 (1): veaa009. doi:10.1093/ve/veaa009. ISSN   2057-1577. PMC   7017920 . PMID   32071766.
  38. Chirico, Nicola; Vianelli, Alberto; Belshaw, Robert (7 July 2010). "Why genes overlap in viruses". Proceedings of the Royal Society B: Biological Sciences. 277 (1701): 3809–3817. doi:10.1098/rspb.2010.1052. ISSN   0962-8452. PMC   2992710 . PMID   20610432.
  39. Brandes, Nadav; Linial, Michal (21 May 2016). "Gene overlapping and size constraints in the viral world". Biology Direct. 11 (1): 26. doi: 10.1186/s13062-016-0128-3 . ISSN   1745-6150. PMC   4875738 . PMID   27209091.
  40. Pavesi, Angelo (July 2020). "New insights into the evolutionary features of viral overlapping genes by discriminant analysis". Virology. 546: 51–66. doi:10.1016/j.virol.2020.03.007. ISSN   0042-6822. PMC   7157939 . PMID   32452417.
  41. Rancurel, C.; Khosravi, M.; Dunker, A. K.; Romero, P. R.; Karlin, D. (29 July 2009). "Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation". Journal of Virology. 83 (20): 10719–10736. doi:10.1128/JVI.00595-09. PMC   2753099 . PMID   19640978.
  42. Abroi, Aare (1 December 2015). "A protein domain-based view of the virosphere–host relationship". Biochimie. 119: 231–243. doi:10.1016/j.biochi.2015.08.008. PMID   26296474.
  43. Vargason, Jeffrey M; Szittya, György; Burgyán, József; Hall, Traci M.Tanaka (December 2003). "Size Selective Recognition of siRNA by an RNA Silencing Suppressor". Cell. 115 (7): 799–811. doi: 10.1016/S0092-8674(03)00984-X . PMID   14697199. S2CID   12993441.
  44. Huvet, Maxime; Stumpf, Michael PH (1 January 2014). "Overlapping genes: a window on gene evolvability". BMC Genomics. 15 (1): 721. doi: 10.1186/1471-2164-15-721 . ISSN   1471-2164. PMC   4161906 . PMID   25159814.
  45. 1 2 Cock, Peter J. A.; Whitworth, David E. (19 March 2007). "Evolution of Gene Overlaps: Relative Reading Frame Bias in Prokaryotic Two-Component System Genes". Journal of Molecular Evolution. 64 (4): 457–462. Bibcode:2007JMolE..64..457C. doi:10.1007/s00239-006-0180-1. PMID   17479344. S2CID   21612308.
  46. Fonseca, M. M.; Harris, D. J.; Posada, D. (5 November 2013). "Origin and Length Distribution of Unidirectional Prokaryotic Overlapping Genes". G3: Genes, Genomes, Genetics. 4 (1): 19–27. doi:10.1534/g3.113.005652. PMC   3887535 . PMID   24192837.
  47. Pallejà, Albert; Harrington, Eoghan D; Bork, Peer (2008). "Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?". BMC Genomics. 9 (1): 335. doi: 10.1186/1471-2164-9-335 . PMC   2478687 . PMID   18627618.
  48. Fellner, Lea; Simon, Svenja; Scherling, Christian; Witting, Michael; Schober, Steffen; Polte, Christine; Schmitt-Kopplin, Philippe; Keim, Daniel A.; Scherer, Siegfried; Neuhaus, Klaus (18 December 2015). "Evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting". BMC Evolutionary Biology. 15 (1): 283. doi: 10.1186/s12862-015-0558-z . PMC   4683798 . PMID   26677845.
  49. McLysaght, Aoife; Guerzoni, Daniele (31 August 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society B: Biological Sciences. 370 (1678): 20140332. doi:10.1098/rstb.2014.0332. PMC   4571571 . PMID   26323763.
  50. 1 2 C. Sanna, W. Li et L. Zhang (2008). "Overlapping genes in the human and mouse genomes". BMC Genomics. 9 (169): 169. doi: 10.1186/1471-2164-9-169 . PMC   2335118 . PMID   18410680.
  51. 1 2 3 Makałowska, Izabela; Lin, Chiao-Feng; Hernandez, Krisitina (2007). "Birth and death of gene overlaps in vertebrates". BMC Evolutionary Biology. 7 (1): 193. doi: 10.1186/1471-2148-7-193 . PMC   2151771 . PMID   17939861.
  52. Veeramachaneni, V. (1 February 2004). "Mammalian Overlapping Genes: The Comparative Perspective". Genome Research. 14 (2): 280–286. doi:10.1101/gr.1590904. PMC   327103 . PMID   14762064.
  53. Behura, Susanta K; Severson, David W (2013). "Overlapping genes of Aedes aegypti: evolutionary implications from comparison with orthologs of Anopheles gambiae and other insects". BMC Evolutionary Biology. 13 (1): 124. doi: 10.1186/1471-2148-13-124 . PMC   3689595 . PMID   23777277.
  54. Murphy, Daniel N.; McLysaght, Aoife; Carmel, Liran (21 November 2012). "De Novo Origin of Protein-Coding Genes in Murine Rodents". PLOS ONE. 7 (11): e48650. Bibcode:2012PLoSO...748650M. doi: 10.1371/journal.pone.0048650 . PMC   3504067 . PMID   23185269.
  55. Knowles, D. G.; McLysaght, A. (2 September 2009). "Recent de novo origin of human protein-coding genes". Genome Research. 19 (10): 1752–1759. doi:10.1101/gr.095026.109. PMC   2765279 . PMID   19726446.
  56. Wright, Bradley W.; Ruan, Juanfang; Molloy, Mark P.; Jaschke, Paul R. (20 November 2020). "Genome Modularization Reveals Overlapped Gene Topology Is Necessary for Efficient Viral Reproduction". ACS Synthetic Biology. 9 (11): 3079–3090. doi:10.1021/acssynbio.0c00323. ISSN   2161-5063. PMID   33044064. S2CID   222300240.
  57. Pradhan, Prajakta; Li, Wen; Kaur, Parjit (January 2009). "Translational Coupling Controls Expression and Function of the DrrAB Drug Efflux Pump". Journal of Molecular Biology. 385 (3): 831–842. doi:10.1016/j.jmb.2008.11.027. PMID   19063901.
  58. Jaschke, Paul R.; Lieberman, Erica K.; Rodriguez, Jon; Sierra, Adrian; Endy, Drew (December 2012). "A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast". Virology. 434 (2): 278–284. doi: 10.1016/j.virol.2012.09.020 . ISSN   0042-6822. PMID   23079106.
  59. Krakauer, D. C.; Plotkin, J. B. (29 January 2002). "Redundancy, antiredundancy, and the robustness of genomes". Proceedings of the National Academy of Sciences. 99 (3): 1405–1409. Bibcode:2002PNAS...99.1405K. doi: 10.1073/pnas.032668599 . ISSN   0027-8424. PMC   122203 . PMID   11818563.
  60. Feiss, Michael; Fisher, R.A.; Crayton, M.A.; Egner, Carol (March 1977). "Packaging of the bacteriophage λ chromosome: Effect of chromosome length". Virology. 77 (1): 281–293. doi:10.1016/0042-6822(77)90425-1. ISSN   0042-6822. PMID   841861.
  61. Aoyama, A; Hayashi, M (September 1985). "Effects of genome size on bacteriophage phi X174 DNA packaging in vitro". Journal of Biological Chemistry. 260 (20): 11033–11038. doi: 10.1016/s0021-9258(17)39144-5 . ISSN   0021-9258. PMID   3161888. S2CID   32443408.
  62. Wu, Zhijian; Yang, Hongyan; Colosi, Peter (January 2010). "Effect of Genome Size on AAV Vector Packaging". Molecular Therapy. 18 (1): 80–86. doi:10.1038/mt.2009.255. ISSN   1525-0016. PMC   2839202 . PMID   19904234.
  63. Vaidyanathan, Sriram; Baik, Ron; Chen, Lu; Bravo, Dawn T.; Suarez, Carlos J.; Abazari, Shayda M.; Salahudeen, Ameen A.; Dudek, Amanda M.; Teran, Christopher A.; Davis, Timothy H.; Lee, Ciaran M. (March 2021). "Targeted replacement of full-length CFTR in human airway stem cells by CRISPR-Cas9 for pan-mutation correction in the endogenous locus". Molecular Therapy. 30 (1): 223–237. doi:10.1016/j.ymthe.2021.03.023. PMC   8753290 . PMID   33794364. S2CID   232761334.
  64. Willis, Sara; Masel, Joanna (19 July 2018). "Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes". Genetics. 210 (1): 303–313. doi:10.1534/genetics.118.301249. ISSN   1943-2631. PMC   6116962 . PMID   30026186.
  65. Pavesi, Angelo; Vianelli, Alberto; Chirico, Nicola; Bao, Yiming; Blinkova, Olga; Belshaw, Robert; Firth, Andrew; Karlin, David (19 October 2018). "Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes". PLOS ONE. 13 (10): e0202513. Bibcode:2018PLoSO..1302513P. doi: 10.1371/journal.pone.0202513 . ISSN   1932-6203. PMC   6195259 . PMID   30339683.
  66. Pavesi, Angelo; Magiorkinis, Gkikas; Karlin, David G. (15 August 2013). "Viral Proteins Originated De Novo by Overprinting Can Be Identified by Codon Usage: Application to the "Gene Nursery" of Deltaretroviruses". PLOS Computational Biology. 9 (8): e1003162. Bibcode:2013PLSCB...9E3162P. doi: 10.1371/journal.pcbi.1003162 . ISSN   1553-7358. PMC   3744397 . PMID   23966842.
  67. "Supplemental Information 2: NCBI genome database accession information (PDF file)". doi: 10.7717/peerj.6447/supp-2 .{{cite journal}}: Cite journal requires |journal= (help)
  68. Ahmed, Niyaz (27 March 2009). "Faculty Opinions recommendation of The RAST Server: rapid annotations using subsystems technology". doi: 10.3410/f.1157743.618965 .{{cite journal}}: Cite journal requires |journal= (help)
  69. Ben-Tal, Nir, ed. (23 June 2017). "Decision letter: Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins". doi: 10.7554/elife.27860.082 .{{cite journal}}: Cite journal requires |journal= (help)
  70. Bazzini, Ariel; Wu, Qiushuang (6 March 2020). "Faculty Opinions recommendation of Pervasive functional translation of noncanonical human open reading frames". doi: 10.3410/f.737484924.793572056 . S2CID   215850701.{{cite journal}}: Cite journal requires |journal= (help)
  71. Prensner, John R.; Enache, Oana M.; Luria, Victor; Krug, Karsten; Clauser, Karl R.; Dempster, Joshua M.; Karger, Amir; Wang, Li; Stumbraite, Karolina; Wang, Vickie M.; Botta, Ginevra (28 January 2021). "Noncanonical open reading frames encode functional proteins essential for cancer cell survival". Nature Biotechnology. 39 (6): 697–704. doi:10.1038/s41587-020-00806-2. ISSN   1087-0156. PMC   8195866 . PMID   33510483.
  72. Cao, Xiongwen; Khitun, Alexandra; Luo, Yang; Na, Zhenkun; Phoodokmai, Thitima; Sappakhaw, Khomkrit; Olatunji, Elizabeth; Uttamapinant, Chayasith; Slavoff, Sarah A. (5 March 2020). "Alt-RPL36 downregulates the PI3K-AKT-mTOR signaling pathway by interacting with TMEM24". Nature Communications. 12 (1): 508. bioRxiv   10.1101/2020.03.04.977314 . doi:10.1038/s41467-020-20841-6. PMC   7820019 . PMID   33479206.
  73. Jaschke, Paul R.; Dotson, Gabrielle A.; Hung, Kay S.; Liu, Diane; Endy, Drew (12 November 2019). "Definitive demonstration by synthesis of genome annotation completeness". Proceedings of the National Academy of Sciences. 116 (48): 24206–24213. Bibcode:2019PNAS..11624206J. doi: 10.1073/pnas.1905990116 . ISSN   0027-8424. PMC   6883844 . PMID   31719208.