Anda di halaman 1dari 32

What is Proteomics?

Marc R. Wilkins is an Australian scientist who is credited with the concept of the proteome. 1 Wilkins coined the term
proteome in 1994 while studying for his PhD.2 This was a generalisation of the concept of the genome
encompassing the set of all proteins that can be produced through the genome, through alternative splicing and post-
transcriptional modification of messenger RNA.3,4 In the first book on proteomics in 1997 Marc Wilkins defined
proteomics as: Proteomics is the study of the entire protein complement of an organism. 4

The evolution of proteomics has developed into a field focused on the analysis of complex protein mixtures with 2D
electrophoresis and/or mass spectrometry. Proteomics is now dominated by the use of LC-MS/MS for the analysis of
proteins. The term proteomics is now colloquially used to describe any protein analysis approach involving MS
analysis.
Publications on Proteomics

Since its inception in 1997 the field of proteomics has seen an exponential rise in research output; from only 1 book in
1997 to over 5000 unique publications on the subject in 2012 (Figure 1).

Figure 1: Number of proteomics publications from 1997 2012.


Cellular Signalling Cascades

Nearly every biological process is regulated by cellular signalling pathways, all of which rely on dynamic post
translational modifications (Figure 2). These post translational modifications are essential for the proteins to transmit
their biological information, e.g. signals downstream of the TNF (tumour necrosis factor alpha) receptor are reliant
on both phosphorylation and ubiquitination of component protein molecules to transmit signals from the cell surface to
the nucleus. The critical role of such modifications simply cannot be overstated in the biological world.

Figure 2: Cellular signalling cascades.5

Basic Principles of the LC-MS/MS Analysis of Peptides

The protein sample, be it a single protein or an entire proteome, is first digested into peptides in bottom up
proteomics research (Figure 3). The proteolytic peptide products are much more amenable to successful analysis by
liquid chromatography and tandem mass spectrometry than their intact protein counterparts. These peptides are
then analysed by LC-MS/MS and the resulting interrogation is by a database search of the appropriate protein
sequences.
In Silico Peptide Prediction
All database search approaches are based on in silico prediction of a peptides characteristics followed by comparing
this to the measured characteristics in the mass spectrometer. Therefore, if we take the human protein sequence
database and theoretically digest it with an enzyme with high specificity, the result is a list of peptide sequences
(Figure 4). From these sequences we can obtain elemental composition, and hence, theoretical mass.

Figure 4: Peptide sequence.

Collisional-induced dissociation (CID) within the MS system results in preferential cleavage of the amide bond in the
peptide, resulting in tandem mass spectra rich in either y type ions (when charge is retained on the C-terminus)
or b type ions (if charge retention is on the N-terminus, Figure 5).
Figure 5: Potential fragments of a peptide sequence produced in tandem MS.

Each of these theoretical fragment ions has an individual elemental composition and mass. Therefore, before we
have run any kind of mass spectrometry experiment we can predict highly specific potential masses for both the intact
peptide and its potential fragments.

In Silico Peptide Analysis

A schematic representation of the in silico process of going from a protein sequence database to a final matrix of
every peptide component is shown (Figure 6). The matrix contains theoretical mass information for the intact peptide
and its potential fragments.
Figure 6: Process of moving from a protein sequence database to a mass matrix of every peptide component.

Tandem MS for Protein Identification

Following the in silico analysis of the peptides characteristics, subsequent LC-MS/MS experiments revolve round the
concept of collecting MS and MS/MS data throughout an LC separation. The MS component collects m/z information
(hence, mass can be inferred if z is detected) of the intact peptide components. The population of ions representing
any given peptide can be isolated, induced to fragment (in CID, either by collision with an inert gas or by resonant
excitation) followed by mass analysis of the fragment ions. This tandem mass spectrometry data can then be
deconvoluted to generate an empirical matrix of masses both of the intact peptide and its detectable fragments
(Figure 7). The database search is effectively a statistical tool to assess the suitability of a match between a
theoretical peptide matrix and that of an empirical matrix.
Figure 7: MS spectra of the intact peptide (left). Tandem MS/MS spectra of peptide fragment ions (middle). MS
matrix obtained (right).

Mulple Reaction Monitoring Iniated Detection and sequencing

All database search approaches are based on in silico prediction of a peptides characteristics followed by comparing
this to the measured characteristics in the mass spectrometer. Therefore, if we take the human protein sequence
database and theoretically digest it with an enzyme with high specificity, the result is a list of peptide sequences
(Figure 4). From these sequences we can obtain elemental composition, and hence, theoretical mass.

Figure 4: Peptide sequence.

Collisional-induced dissociation (CID) within the MS system results in preferential cleavage of the amide bond in the
peptide, resulting in tandem mass spectra rich in either y type ions (when charge is retained on the C-terminus)
or b type ions (if charge retention is on the N-terminus, Figure 5).
Figure 5: Potential fragments of a peptide sequence produced in tandem MS.

Each of these theoretical fragment ions has an individual elemental composition and mass. Therefore, before we
have run any kind of mass spectrometry experiment we can predict highly specific potential masses for both the intact
peptide and its potential fragments.

In Silico Peptide Analysis

A schematic representation of the in silico process of going from a protein sequence database to a final matrix of
every peptide component is shown (Figure 6). The matrix contains theoretical mass information for the intact peptide
and its potential fragments.
Figure 6: Process of moving from a protein sequence database to a mass matrix of every peptide component.

Tandem MS for Protein Identification

Following the in silico analysis of the peptides characteristics, subsequent LC-MS/MS experiments revolve round the
concept of collecting MS and MS/MS data throughout an LC separation. The MS component collects m/z information
(hence, mass can be inferred if z is detected) of the intact peptide components. The population of ions representing
any given peptide can be isolated, induced to fragment (in CID, either by collision with an inert gas or by resonant
excitation) followed by mass analysis of the fragment ions. This tandem mass spectrometry data can then be
deconvoluted to generate an empirical matrix of masses both of the intact peptide and its detectable fragments
(Figure 7). The database search is effectively a statistical tool to assess the suitability of a match between a
theoretical peptide matrix and that of an empirical matrix.
Figure 7: MS spectra of the intact peptide (left). Tandem MS/MS spectra of peptide fragment ions (middle). MS
matrix obtained (right).
\\
Comprensive postraslational modification
Ubiquitin (Ub)

Ubiquitin (Ub) is a small regulatory protein that is found in almost all tissues of eukaryotic organisms. Ubiquitination
(also known as ubiquitylation) is an enzymatic, post-translational modification (PTM) process in which an ubiquitin
protein is attached to a substrate protein. Small Ubiquitin-like Modifier (SUMO) proteins are a family of small proteins
that are covalently attached to and detached from other proteins in cells to modify their function. SUMOylation is a
PTM involved in various cellular processes, such as nuclear-cytosolic transport, transcriptional regulation, apoptosis,
protein stability, response to stress, and progression through the cell cycle. 16

Attachment to lysine is through a glycine/cysteine (GC) c-terminus on both Ub and SUMO.8,9


Both modes are important and occur via an enzyme cascade (Figure 25).

Figure 25: Ubiquitination enzyme cascade.

Tryptic ubiquitin isopeptides

After activation ubiquitin attaches to acceptor lysine forming an isopeptide bond (Figure 26). Trypsin yields a GC tag
but no diagnostic ions are formed. The mass addition of 114.0429 must be relied upon for identification. Our method
exploits the fact that isopeptides have an additional N-terminus (cf. linear peptides).
Figure 26: Activated ubiquitin attaching to an acceptor lysine to form an isopeptide bond.

The peptide can be derivatised at the N-termini using reductive methylation (dimethyl labeling, DML). It has been
well reported that this generates a strong a1 ion upon CID (Figure 27).10 It has also been postulated that isopeptides
will have an extra a1 ion at m/z 62 and a b2 ion at m/z 147. Deuterated reagents can be used to generate a much
more selective m/z 147 ion (cf. the m/z 143 ion if non-deuterated reagents are used).
Figure 27: N-termini peptide derivatisation using reductive methylation and the resulting CID spectra showing the
fragment ions.

Ubiquinated ubiquitin on lysine 48 (Figure 28) shows a strong ion at m/z 62 and an additional ion at m/z 147. The
presence of the a1 ion at m/z 118 confirms that it is an isopeptide.
Figure 28: MS/MS spectrum of Ub isopeptide (DML).

Figure 29 shows the MS/MS spectrum of the low mass region with an isotag. A large m/z 62 ion and smaller, but
highly selective m/z 147 is also produced. Taken together with the backbone a1 ion and increased b ion coverage
makes identification of the modified peptide easier.
Figure 29: MS/MS spectrum of the low mass region with a labeled isotag.

Figure 30 shows the same peptide but without the GC tag i.e. not ubiquitinated.
The a1 backbone ion is still generated as expected (DML has worked) but no a1 or b2 ions are present confirming
that this is not an isopeptide.
Figure 30: MS/MS spectrum of low mass region without isotag.

The digest used in figure 31 was DML labeled and analysed by IDA on qStar.
The presence of a1 and b2 ions can be clearly seen along with extensive b ions on the backbone resulting in facile
identification of the peptide.
Figure 31: MS/MS spectrum of 1D gel digest from Ub pulldown material.

Consecutive residue aadition to lysine-CRA(K)


Unlike Ub, SUMO generates a large tag upon tryptic digestion. However, we have found that trypsin (or something in
the trypsin) cleaves the c-terminal to glutamine (Q) leaving MS useful tags. This has been found by searching the
data with variable modes on K of G, GG, TGG, QTGG, QQTG, GQQQTGG (Figure 32).
The use of DML may improve detection.
Figure 32: Consecutive residue addition to lysine CRA(K).

Tags are indeed found and they generate diagnostic ions such as b3 for TGG tag (Figure 33) and b2 and b4 ions for
GGTQ tag (Figure 34).
Figure 33: SUMO TGG peptide from trypsin digestion.
Figure 34: SUMO QTGG peptide from trypsin digestion.

The product ion spectrum of the DML isopeptide carrying the QTGG isopeptide contains the predicted a1 b2 and b4
diagnostic ions in addition to a multitude of backbone sequence ions (Figure 35, left). Taken together, these ion
series facilitate the confident assignment of this spectrum to the backbone sequence, the QTGG isotag, and the site
of modification as K4.

The product ion spectrum of the DML isopeptide carrying the TGG isopeptide contains the predicteda1 b2 and b3
diagnostic ions in addition to a multitude of backbone sequence ions (Figure 35, right). Taken together, these ion
series facilitate the confident assignment of this spectrum to the backbone sequence, the TGG isotag and the site of
modification as K4.
Figure 35: Tandem MS/MS spectrum showing effect of DML on CRA(K) peptides. 12

Data Independent Acquisition (DIA)

A mixture of DML E.coli tryptic digest and DML synthetic isopeptides were mixed and run using the Data Independent
Acquisition (DIA) approach termed SWATH (Figure 36). For complex mixture analysis the use of Data Independent
Acquisition (DIA) scanning routines negates the duty cycle limitations associated with classical Data Dependent
Acquisition (DDA). DDA analysis requires that the precursor of interest is selected for fragmentation. In a DIA
workflow, ALL ions are fragmented in a non-biased manner. One method of DIA segments the duty cycle into a
series of user-defined SWATH m/z windows.

The 5600 acquires an MS scan followed by 80 5 Da SWATH MS/MS acquisitions over the m/z range 400-800
Th. This results in every putative precursor being selected for MS/MS in a data independent manner, therefore,
enabling a far more comprehensive analysis by which precursors of interest can be implicated by extraction of
diagnostic ions.

A mixture of E. coli digest and Ub-isopeptides is dimethyl labelled and analysed by DIA on a 5600 (Figure 36). Post-
acquisition ion extraction of the a1 (m/z 62.09) and b2 (m/z 147.11) ions is then utilised to screen for the modified
peptides.
Figure 36: Mixture of E. coli digest and Ub-isopeptides is dimethyl labelled and analysed by DIA on a 5600.

Figure 37 shows that in one SWATH window all precursors between 504.0 and 510.0 are fragmented. XIC overlay of
two diagnostic ions shows co-elution. In practice this would have to be repeated for all 80 (5 Da) windows so a
software script is required to facilitate this process.
Figure 37: XIC overlay of two diagnostic ions.

At the time of elution of the diagnostic ions the MS/MS spectrum can be extracted (Figure 38). This may contain
other precursor ions but clearly the main sequence ions enable easy identification. If this is not the case, the MS
scan for this time and for this SWATH window can be added and all the precursors compiled to give an inclusion list
for subsequent DDA runs.
Figure 38: MS/MS spectrum at 27.47 minutes.

What is the Future of PTM Analysis?


Unlike Ub, SUMO generates a large tag upon tryptic digestion. However, we have found that trypsin (or something in
the trypsin) cleaves the c-terminal to glutamine (Q) leaving MS useful tags. This has been found by searching the
data with variable modes on K of G, GG, TGG, QTGG, QQTG, GQQQTGG (Figure 32).
The use of DML may improve detection.
Figure 32: Consecutive residue addition to lysine CRA(K).

Tags are indeed found and they generate diagnostic ions such as b3 for TGG tag (Figure 33) and b2 and b4 ions for
GGTQ tag (Figure 34).
Figure 33: SUMO TGG peptide from trypsin digestion.
Figure 34: SUMO QTGG peptide from trypsin digestion.

The product ion spectrum of the DML isopeptide carrying the QTGG isopeptide contains the predicted a1 b2 and b4
diagnostic ions in addition to a multitude of backbone sequence ions (Figure 35, left). Taken together, these ion
series facilitate the confident assignment of this spectrum to the backbone sequence, the QTGG isotag, and the site
of modification as K4.

The product ion spectrum of the DML isopeptide carrying the TGG isopeptide contains the predicteda1 b2 and b3
diagnostic ions in addition to a multitude of backbone sequence ions (Figure 35, right). Taken together, these ion
series facilitate the confident assignment of this spectrum to the backbone sequence, the TGG isotag and the site of
modification as K4.
Figure 35: Tandem MS/MS spectrum showing effect of DML on CRA(K) peptides. 12

Data Independent Acquisition (DIA)

A mixture of DML E.coli tryptic digest and DML synthetic isopeptides were mixed and run using the Data Independent
Acquisition (DIA) approach termed SWATH (Figure 36). For complex mixture analysis the use of Data Independent
Acquisition (DIA) scanning routines negates the duty cycle limitations associated with classical Data Dependent
Acquisition (DDA). DDA analysis requires that the precursor of interest is selected for fragmentation. In a DIA
workflow, ALL ions are fragmented in a non-biased manner. One method of DIA segments the duty cycle into a
series of user-defined SWATH m/z windows.

The 5600 acquires an MS scan followed by 80 5 Da SWATH MS/MS acquisitions over the m/z range 400-800
Th. This results in every putative precursor being selected for MS/MS in a data independent manner, therefore,
enabling a far more comprehensive analysis by which precursors of interest can be implicated by extraction of
diagnostic ions.

A mixture of E. coli digest and Ub-isopeptides is dimethyl labelled and analysed by DIA on a 5600 (Figure 36). Post-
acquisition ion extraction of the a1 (m/z 62.09) and b2 (m/z 147.11) ions is then utilised to screen for the modified
peptides.
Figure 36: Mixture of E. coli digest and Ub-isopeptides is dimethyl labelled and analysed by DIA on a 5600.

Figure 37 shows that in one SWATH window all precursors between 504.0 and 510.0 are fragmented. XIC overlay of
two diagnostic ions shows co-elution. In practice this would have to be repeated for all 80 (5 Da) windows so a
software script is required to facilitate this process.
Figure 37: XIC overlay of two diagnostic ions.

At the time of elution of the diagnostic ions the MS/MS spectrum can be extracted (Figure 38). This may contain
other precursor ions but clearly the main sequence ions enable easy identification. If this is not the case, the MS
scan for this time and for this SWATH window can be added and all the precursors compiled to give an inclusion list
for subsequent DDA runs.
Figure 38: MS/MS spectrum at 27.47 minutes.

Anda mungkin juga menyukai