Peptide spectral library

Last updated

A peptide spectral library is a curated, annotated and non-redundant collection/database of LC-MS/MS peptide spectra. One essential utility of a peptide spectral library is to serve as consensus templates supporting the identification of peptides and proteins based on the correlation between the templates with experimental spectra.[ citation needed ]

Contents

One potential application of peptide spectral libraries is the identification of new, currently unknown mass spectra. Here, the spectra from the library are compared to the new spectra and if a match is found, the unknown spectra can be assigned the identity of the known peptide in the library.

Spectral libraries have been used in the small molecules mass spectra identification since the 1980s. [1] In the early years of shotgun proteomics, pioneer investigations suggested that a similar approach might be applicable in shotgun proteomics for peptide/protein identification. [2]

Shotgun proteomics

Modern tandem mass spectrometry (MS) instruments combine features of fast duty cycle, exquisite sensitivity, and unprecedented mass accuracy. Tandem mass spectrometry, which is an ideal match for the large-scale protein identification and quantification in complex biological systems. In a shotgun proteomics approach, proteins in a complex mixture are digested by proteolytic enzymes such as trypsin. Subsequently, one or more chromatographic separations are applied to resolve resulting peptides, which are then ionized and analyzed in a mass spectrometer. To acquire tandem mass spectra, a particular peptide precursor is isolated, and fragmented in a mass spectrometer; the mass spectra corresponding to the fragments of peptide precursor is recorded. Tandem mass spectra contains specific information regarding the sequence of the peptide precursor, which can aid the identification of the peptide/protein.

Protein identification via sequence database searching

Sequence database searching is widely used currently for mass spectra based protein identification. In this approach, a protein sequence database is used to calculate all putative peptide candidates in the given setting (proteolytic enzymes, miscleavages, post-translational modifications). The sequence search engines use various heuristics to predict the fragmentation pattern of each peptide candidate. Such derivative patterns are used as templates to find a sufficiently close match within experimental mass spectra, which serves as the basis for peptide/protein identification. Many tools have been developed for this practice, which have enabled many past discoveries, e.g. SEQUEST, [3] Mascot. [4]

Shortcomings of the sequence database searching workflow

Due to the complex nature of peptide fragmentation in a mass spectrometer, derivative fragmentation patterns fall short of reproducing experimental mass spectra, especially relative intensities among distinct fragments.[ citation needed ] Thus, sequence database searching faces a bottleneck of limited specificity. Sequence database searching also demands vast search space, which still could not cover all possibilities of peptide dynamics, exhibiting limited efficiency post-translational modifications). The search process is sometimes slow and requires costly high-performance computers. In addition, the nature of sequence database searching disconnects the research discoveries among different groups or at different times.

Advantages and limitations

First, a greatly reduced search space will decrease the searching time. Second, by taking full advantage of all spectral features including relative fragment intensities, neutral losses from fragments and various additional specific fragments, the process of spectra searching will be more specific, and it will generally provide better discrimination between true and false matches.[ citation needed ]

Spectral library searching is not applicable in a situation where the discovery of novel peptides or proteins is the goal. However, more and more high-quality mass spectra are being acquired by the collective contribution of the scientific community, which will continuously expand the coverage of peptide spectral libraries.

Research community-focused libraries

For a peptide spectral library, to reach a maximal coverage is a long-term goal, even with the support of scientific community and ever-growing proteomic technologies.[ citation needed ] However, the optimization for a particular module of the peptide spectra library is a more manageable goal, e.g. the proteins in a particular organelle or relevant to a particular biological phenotype. For example, a researcher studying the mitochondrial proteome will likely focus on analyses within protein modules within the mitochondria. The research community focused peptide spectral library supports targeted research in a comprehensive fashion for a particular research community.[ citation needed ]

Related Research Articles

<span class="mw-page-title-main">Mass spectrometry</span> Analytical technique based on determining mass to charge ratio of ions

Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a mass spectrum, a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used in many different fields and is applied to pure samples as well as complex mixtures.

<span class="mw-page-title-main">Tandem mass spectrometry</span> Type of mass spectrometry

Tandem mass spectrometry, also known as MS/MS or MS2, is a technique in instrumental analysis where two or more stages of analysis using one or more mass analyzer are performed with an additional reaction step in between these analyses to increase their abilities to analyse chemical samples. A common use of tandem MS is the analysis of biomolecules, such as proteins and peptides.

<span class="mw-page-title-main">Protein sequencing</span> Sequencing of amino acid arrangement in a protein

Protein sequencing is the practical process of determining the amino acid sequence of all or part of a protein or peptide. This may serve to identify the protein or characterize its post-translational modifications. Typically, partial sequencing of a protein provides sufficient information to identify it with reference to databases of protein sequences derived from the conceptual translation of genes.

<span class="mw-page-title-main">Peptide mass fingerprinting</span> Analytical technique for protein identification

Peptide mass fingerprinting (PMF), also known as protein fingerprinting, is an analytical technique for protein identification in which the unknown protein of interest is first cleaved into smaller peptides, whose absolute masses can be accurately measured with a mass spectrometer such as MALDI-TOF or ESI-TOF. The method was developed in 1993 by several groups independently. The peptide masses are compared to either a database containing known protein sequences or even the genome. This is achieved by using computer programs that translate the known genome of the organism into proteins, then theoretically cut the proteins into peptides, and calculate the absolute masses of the peptides from each protein. They then compare the masses of the peptides of the unknown protein to the theoretical peptide masses of each protein encoded in the genome. The results are statistically analyzed to find the best match.

Sequest is a tandem mass spectrometry data analysis program used for protein identification. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.

The Trans-Proteomic Pipeline (TPP) is an open-source data analysis software for proteomics developed at the Institute for Systems Biology (ISB) by the Ruedi Aebersold group under the Seattle Proteome Center. The TPP includes PeptideProphet, ProteinProphet, ASAPRatio, XPRESS and Libra.

A peptide sequence tag is a piece of information about a peptide obtained by tandem mass spectrometry that can be used to identify this peptide in a protein database.

Mascot is a software search engine that uses mass spectrometry data to identify proteins from peptide sequence databases. Mascot is widely used by research facilities around the world. Mascot uses a probabilistic scoring algorithm for protein identification that was adapted from the MOWSE algorithm. Mascot is freely available to use on the website of Matrix Science. A license is required for in-house use where more features can be incorporated.

<span class="mw-page-title-main">Electron-transfer dissociation</span>

Electron-transfer dissociation (ETD) is a method of fragmenting multiply-charged gaseous macromolecules in a mass spectrometer between the stages of tandem mass spectrometry (MS/MS). Similar to electron-capture dissociation, ETD induces fragmentation of large, multiply-charged cations by transferring electrons to them. ETD is used extensively with polymers and biological molecules such as proteins and peptides for sequence analysis. Transferring an electron causes peptide backbone cleavage into c- and z-ions while leaving labile post translational modifications (PTM) intact. The technique only works well for higher charge state peptide or polymer ions (z>2). However, relative to collision-induced dissociation (CID), ETD is advantageous for the fragmentation of longer peptides or even entire proteins. This makes the technique important for top-down proteomics. The method was developed by Hunt and coworkers at the University of Virginia.

<span class="mw-page-title-main">Protein mass spectrometry</span> Application of mass spectrometry

Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. Mass spectrometry is an important method for the accurate mass determination and characterization of proteins, and a variety of methods and instrumentations have been developed for its many uses. Its applications include the identification of proteins and their post-translational modifications, the elucidation of protein complexes, their subunits and functional interactions, as well as the global measurement of proteins in proteomics. It can also be used to localize proteins to the various organelles, and determine the interactions between different proteins as well as with membrane lipids.

Shotgun proteomics refers to the use of bottom-up proteomics techniques in identifying proteins in complex mixtures using a combination of high performance liquid chromatography combined with mass spectrometry. The name is derived from shotgun sequencing of DNA which is itself named after the rapidly expanding, quasi-random firing pattern of a shotgun. The most common method of shotgun proteomics starts with the proteins in the mixture being digested and the resulting peptides are separated by liquid chromatography. Tandem mass spectrometry is then used to identify the peptides.

<span class="mw-page-title-main">Top-down proteomics</span>

Top-down proteomics is a method of protein identification that either uses an ion trapping mass spectrometer to store an isolated protein ion for mass measurement and tandem mass spectrometry (MS/MS) analysis or other protein purification methods such as two-dimensional gel electrophoresis in conjunction with MS/MS. Top-down proteomics is capable of identifying and quantitating unique proteoforms through the analysis of intact proteins. The name is derived from the similar approach to DNA sequencing. During mass spectrometry intact proteins are typically ionized by electrospray ionization and trapped in a Fourier transform ion cyclotron resonance, quadrupole ion trap or Orbitrap mass spectrometer. Fragmentation for tandem mass spectrometry is accomplished by electron-capture dissociation or electron-transfer dissociation. Effective fractionation is critical for sample handling before mass-spectrometry-based proteomics. Proteome analysis routinely involves digesting intact proteins followed by inferred protein identification using mass spectrometry (MS). Top-down MS (non-gel) proteomics interrogates protein structure through measurement of an intact mass followed by direct ion dissociation in the gas phase.

<span class="mw-page-title-main">Bottom-up proteomics</span>

Bottom-up proteomics is a common method to identify proteins and characterize their amino acid sequences and post-translational modifications by proteolytic digestion of proteins prior to analysis by mass spectrometry. The major alternative workflow used in proteomics is called top-down proteomics where intact proteins are purified prior to digestion and/or fragmentation either within the mass spectrometer or by 2D electrophoresis. Essentially, bottom-up proteomics is a relatively simple and reliable means of determining the protein make-up of a given sample of cells, tissues, etc.

<span class="mw-page-title-main">Quantitative proteomics</span> Analytical chemistry technique

Quantitative proteomics is an analytical chemistry technique for determining the amount of proteins in a sample. The methods for protein identification are identical to those used in general proteomics, but include quantification as an additional dimension. Rather than just providing lists of proteins identified in a certain sample, quantitative proteomics yields information about the physiological differences between two biological samples. For example, this approach can be used to compare samples from healthy and diseased patients. Quantitative proteomics is mainly performed by two-dimensional gel electrophoresis (2-DE), preparative native PAGE, or mass spectrometry (MS). However, a recent developed method of quantitative dot blot (QDB) analysis is able to measure both the absolute and relative quantity of an individual proteins in the sample in high throughput format, thus open a new direction for proteomic research. In contrast to 2-DE, which requires MS for the downstream protein identification, MS technology can identify and quantify the changes.

<span class="mw-page-title-main">Selected reaction monitoring</span> Tandem mass spectrometry method

Selected reaction monitoring (SRM), also called multiple reaction monitoring (MRM), is a method used in tandem mass spectrometry in which an ion of a particular mass is selected in the first stage of a tandem mass spectrometer and an ion product of a fragmentation reaction of the precursor ions is selected in the second mass spectrometer stage for detection.

<span class="mw-page-title-main">Lys-N</span>

Lys-N is a metalloendopeptidase found in the mushroom Grifola frondosa that cleaves proteins on the amino side of lysine residues.

In bio-informatics, a peptide-mass fingerprint or peptide-mass map is a mass spectrum of a mixture of peptides that comes from a digested protein being analyzed. The mass spectrum serves as a fingerprint in the sense that it is a pattern that can serve to identify the protein. The method for forming a peptide-mass fingerprint, developed in 1993, consists of isolating a protein, breaking it down into individual peptides, and determining the masses of the peptides through some form of mass spectrometry. Once formed, a peptide-mass fingerprint can be used to search in databases for related protein or even genomic sequences, making it a powerful tool for annotation of protein-coding genes.

In mass spectrometry, de novo peptide sequencing is the method in which a peptide amino acid sequence is determined from tandem mass spectrometry.

In mass spectrometry, data-independent acquisition (DIA) is a method of molecular structure determination in which all ions within a selected m/z range are fragmented and analyzed in a second stage of tandem mass spectrometry. Tandem mass spectra are acquired either by fragmenting all ions that enter the mass spectrometer at a given time or by sequentially isolating and fragmenting ranges of m/z. DIA is an alternative to data-dependent acquisition (DDA) where a fixed number of precursor ions are selected and analyzed by tandem mass spectrometry.

References

  1. Domokos, L., Hennberg, D., and Weimann, B. 1984. Computer-aided identification of compounds by comparison of mass spectra. Anal. Chim. Acta 165:61-74.
  2. Yates, J.R., 3rd, Morgan, S.F., Gatlin, C.L., Griffin, P.R., and Eng, J.K. 1998. Method to compare collision-induced dissociation spectra of peptides: Potential for library searching and subtractvie analysis. Anal. Chem., 70:3557-3565.
  3. Eng, J.K. et al. (1994) An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J. Am. Soc. Mass Spectrom., 5,976-989.
  4. Perkins, D.N. et al. (1999) Probability-based protein identification by searching sequence database using mass spectrometry data. Electrophoresis, 20, 3551-3567.