Skip Navigation


JXB Advance Access originally published online on March 21, 2006
Journal of Experimental Botany 2006 57(7):1509-1514; doi:10.1093/jxb/erj139
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
57/7/1509    most recent
erj139v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Berg, M.
Right arrow Articles by Björkesten, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Berg, M.
Right arrow Articles by Björkesten, L.
Agricola
Right arrow Articles by Berg, M.
Right arrow Articles by Björkesten, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author [2006]. Published by Oxford University Press [on behalf of the Society for Experimental Biology]. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

RESEARCH PAPER

Reproducibility of LC-MS-based protein identification

Matthias Berg1,*, Axel Parbel1, Harald Pettersen2, David Fenyö2 and Lennart Björkesten2

1GE Healthcare, Oskar-Schlemmer-Strasse II, D-80807 München, Germany
2GE Healthcare, Björkgatan 30, Uppsala, Sweden

*To whom correspondence should be addressed. E-mail: matthias.berg{at}ge.com

Received 8 July 2005; Accepted 26 January 2006


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Traditional analysis of liquid chromatography-mass spectrometry (LC-MS) data, typically performed by reviewing chromatograms and the corresponding mass spectra, is both time-consuming and difficult. Detailed data analysis is therefore often omitted in proteomics applications. When analysing multiple proteomics samples, it is usually only the final list of identified proteins that is reviewed. This may lead to unnecessarily complex or even contradictory results because the content of the list of identified proteins depends heavily on the conditions for triggering the collection of tandem mass spectra. Small changes in the signal intensity of a peptide in different LC-MS experiments can lead to the collection of a tandem mass spectrum in one experiment but not in another. Also, the quality of the tandem mass spectrometry experiments can vary, leading to successful identification in some cases but not in others. Using a novel image analysis approach, it is possible to achieve repeat analysis with a very high reproducibility by matching peptides across different LC-MS experiments using the retention time and parent mass over charge (m/z). It is also easy to confirm the final result visually. This approach has been investigated by using tryptic digests of integral membrane proteins from organelle-enriched fractions from Arabidopsis thaliana and it has been demonstrated that very highly reproducible, consistent, and reliable LC-MS data interpretation can be made.

Key words: DeCyderTM MS, differential expression analysis, LC-MS, reproducibility, reversed phase chromatography, nano LC, tandem mass spectrometry


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Proteomics has the potential to make a major contribution in the quest to cure human disease by comparing the protein levels in healthy and diseased samples. This also includes the analysis of samples representing different stages of disease, and under differing biological conditions to understand more clearly the role that proteins play and to identify potential biomarkers. Mass spectrometry–based proteomics has the capability to identify hundreds of proteins in a single experiment, and has become an important analytical technology in modern biological and medical research (Aebersold and Mann, 2003).

Proteomics samples, which are often complex mixtures of proteins, are usually digested with an endoprotease such as trypsin before mass spectrometry analysis. In a classical liquid chromatography-mass spectrometry (LC-MS) experiment, the resulting peptides are then separated by reversed-phase micro- or nano-capillary chromatography. Peptides eluting from the LC column are usually ionized by electrospray and then introduced into the mass spectrometer. Peptide masses and intensities are measured with the mass spectrometer and based on the signal intensity peptides are selected for fragmentation to obtain information on their sequence. Tandem mass spectra are acquired and searched against sequence collections to identify the corresponding peptides and proteins (Fenyo, 2000).

The traditional way to visualize LC-MS data for data quality assessment and confirmation of results is to use total ion or base ion chromatograms together with single or averaged mass spectra of all peptides eluting at a certain time, and tandem mass spectra of single peptides. Such visualizations provide detailed insight into a specific performance characteristic of an LC-MS experiment, such as the quality of fragment spectra, mass resolution, or chromatographic peak resolution. However, they are not very intuitive, and the information on m/z and retention time correlation is not easily accessible. By contrast, with a two-dimensional visualization of LC-MS data, it is easy to find indicators for problems like non-covalent adduct formation or sample contamination (e.g. PEG) commonly encountered in LC-MS analyses (Schulz-Knappe et al., 2001; Heine et al., 2002; Palmblad et al., 2002; Skold et al., 2002; Svensson et al., 2003; Tammen et al., 2003; Wang et al., 2003; Anderle et al., 2004; Li et al., 2004; Radulovic et al., 2004; Wiener et al., 2004; Listgarten and Emili, 2005; Berg et al., 2006).

One of the major challenges in proteomics relates to looking for differences between samples belonging to different experimental groups (e.g. healthy/disease or control/treated). It is critical to minimize the variation between technical replicates, i.e. repeated analysis of the same sample (Venable and Yates, 2004), and to move the focus onto biological variation to allow for the sensitive detection of biologically relevant differences between the groups. Several factors are crucial, including sample quality, reproducibility of sample preparation, quality of the chromatography system used, and performance of the mass spectrometer. The reproducibility of the MS ion signal for technical replicates is investigated here and the reproducibility of protein identification is compared with it.

Because the outcome of an LC-MS experiment depends on many different variables, it is difficult to optimize the system by systematically optimizing individual variables. In this paper, examples are presented of how the 2D and 3D visualization approach of DeCyderTM MS Differential Analysis Software (DeCyder MS) (GE Healthcare), where retention time, precursor mass, and the topology of the intensity profile are co-visualized, can be used in combination with the matching of tandem mass spectra, to achieve a very high reproducibility within technical replicates.

DeCyderTM MS is a software intended for differential analysis of data from LC-MS experiments. It provides novel 2D and 3D visualizations of LC-MS data to allow for raw data quality assessment and interactive confirmation of results achieved using automated methods for peptide detection, charge state assignments, and peptide matching across multiple LC-MS experiments. Univariate statistical tools (Students t test and ANOVA) are available to identify significantly varying peptides among different groups of samples and variation patterns can be visualized in various graphs (A Kaplan, M Söderström, D Fenyö, H Pettersen, S Lindqvist, L Björkesten, unpublished data).

The technique described above was used to analyse protein abundance in samples which formed part of a study designed to identify genuine residents within plant organelles. In this study, a cellular extract from Arabidopsis thaliana non-photosynthetic callus cultures was prepared and a total membrane fraction applied to an iodixanol self-forming density gradient (Dunkley et al., 2004). Fractions from this gradient were analysed in a study not described here, in an attempt to match the distribution of protein of unknown location in the cell with that of known organelle markers. Here, four consecutive fractions from the lower end of this gradient, which was the site of enrichment of mitochondrial, plastid, and rough endoplasmic reticulum, were taken and proteins assessed in terms of reproducibility of technical replicates in LC-MS experiments where relatively complex fractions are analysed.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Samples from organelle-enriched fractions of Arabidopsis thaliana were prepared (Dunkley et al., 2004). The fractions were digested using trypsin and analysed by one-dimensional LC-MS using an EttanTM MDLC system (GE Healthcare) in high-throughput configuration directly connected to a FinniganTM LTQTM system (Thermo Electron). Samples were concentrated and desalted on RPC trap columns (ZorbaxTM 300 SB C18, 0.3 mmx5 mm, Agilent Technologies), and the peptides were separated on a nano RPC column (Zorbax 300 SB C18, 0.075 mmx100 mm, Agilent Technologies) using a linear acetonitrile gradient from 0% to 48% ACN (GE Healthcare, 1% ACN increase min–1). All buffers used for nano LC separation contained 0.1% formic acid (Fluka) as the ion pairing reagent. Full scan mass spectra were recorded in profile mode and tandem mass spectra in centroid mode. The peptides were identified using the information in the tandem mass spectra by searching against the A. thaliana proteome (Birney et al., 2004) using X!Tandem (Craig and Beavis, 2004) (Beavis Informatics) using an expectation value cut-off of 0.01. The expectation value is a measure of the statistical significance of the identification being true (Eriksson et al., 2000; Eriksson and Fenyo, 2002; Fenyo and Beavis, 2003). It is calculated by extrapolating the extreme value distribution of scores for randomly matching protein sequences observed at low scores.

The LC-MS data from the different samples was displayed as two-dimensional intensity maps with m/z and retention time on the two axes and a grey scale representing the intensity of a peak at a certain m/z and retention time using DeCyderTM MS Differential Analysis Software (DeCyder MS ) (GE Healthcare) (A Kaplan, M Söderström, D Fenyö, H Pettersen, S Lindqvist, L Björkesten, unpublished data). DeCyder MS was also used to analyse the intensity maps in two steps. In the first step a dedicated image analysis algorithm was used to perform peptide detection, charge state assignment, and quantitation in the PepDetect module of the software. The detected peptides were indicated in the intensity maps by boxes and the MS/MS events were marked by crosses. The second step in the analysis was the matching of peptides falling within a user-defined mass and retention time interval in a comparison between different intensity maps from replicate analyses using the PepMatch module. MS/MS data corresponding to the detected peptides was exported and searched using X!Tandem and identification information imported back into DeCyder MS.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Four different fractions isolated from a density gradient of a membrane preparation from A. thaliana were analysed by LC-MS in multiple technical replicates (four or five consecutive runs of the same sample). The data were evaluated in terms of the reproducibility between the different replicas.

The number of uniquely identified peptides found in one, some or all replicates is shown for each fraction in Fig. 1 and the peptide signal intensity reproducibility is illustrated in Fig. 2. The signal intensity distribution gives a first hint about the reproducibility of the LC-MS data and should be examined before further analysis. In this case, the intensity distributions indicate a good LC-MS data reproducibility in between replicas.


Figure 1
View larger version (12K):
[in this window]
[in a new window]
 
Fig. 1. The number of peptides identified using X!Tandem with expectation values less than 0.01 in all, some, and only one of the replicates from four different Arabidopsis samples. Samples 1 and 2 had five replicate analyses and samples 3 and 4 had four replicates.

 

Figure 2
View larger version (37K):
[in this window]
[in a new window]
 
Fig. 2. The variation in intensity between different replicate runs is shown for four different samples. The peptides matching all replicate intensity maps are shown as red dots and the peptides matching some replicate intensity maps (e.g. 1–3) are shown as blue dots. The random variation in 2log peak intensity between repeat analyses is in the range of a few percent, making it straightforward to compare repeat analyses by comparing intensity maps.

 
One of the identified proteins (porin, Ensemble: At3g01280.1) was taken as an example for closer evaluation. Figure 3 shows three peptides from At3g01280.1 that were automatically detected and matched across all replicates using DeCyder MS. Visual inspection of the LC-MS data, reveals that only two of the peptides have associated tandem mass spectra for all repeats. Furthermore, not all repeats with associated tandem mass spectra could be successfully identified (Table 1).


Figure 3
View larger version (79K):
[in this window]
[in a new window]
 
Fig. 3. Examples of intensity maps showing three peptides (From top: KGDLLLGDVAF, ASALIQHEWKPK, INAGLSFTK) from an Arabidopsis porin (Ensembl:At3g01280.1) for five replicate LC-MS runs clearly showing that these peptides are present in all runs. The tandem mass spectra that lead to a successful identification are indicated by red markers (expectation value, e <0.01) and the tandem mass spectra that did not lead to a successful identification are indicated by white markers. Note also the lack of tandem mass spectra (no marker) for some peptides.

 

View this table:
[in this window]
[in a new window]
 
Table 1. The expectation value (e) for peptides with e <0.01 from an Arabidopsis porin (Ensembl:At3g01280.1) for five replicate LC-MS runs

 
This demonstrates clearly that the software used for selecting ions for tandem mass spectrometric analysis and the identification algorithms are sensitive to small variations in peak intensity and tandem mass spectrum quality and therefore cause variations in the overall results of proteomics data. This sensitivity reduces the reproducibility of LC-MS data that could be obtained with modern instruments, and which could easily be seen using the visualization and matching tools provided in DeCyder MS. The data for the selected protein (Ensemble: At3g01280.1) are summarized in Table 1. Evaluation of data from all five technical replicates identify, in total, 33 different peptides of this protein. But in each single replicate not more than a maximum of 23 peptides have been identified, whereas using the strategy involving DeCyder MS for detection and matching of peptides between replicates resulted in 29 peptides being identified. This observation supports the need of comparing complete data sets on the basis of intensity maps to be able reproducibly to detect and assign the peptides observed

The visualization of LC-MS data as a two-dimensional intensity map resembles very much a 2D-PAGE image. This way of presenting of LC-MS data is much more intuitive to the human eye than the conventional way of inspecting the total ion chromatogram and individual mass spectra. Inspecting the intensity map can help to assess rapidly the overall quality of an LC-MS analysis.

The images from DeCyder MS can also be used to check the reproducibility and consistency of replicate sample analyses. Inconsistency in database search results from replicate analyses can be explained by inspecting the intensity maps showing the tandem mass spectrometric events. The differences between replicate analyses are due to the fact that tandem mass spectra are acquired at slightly different retention times and m/z values due to the variation in the intensity of peptides between replicate analyses. In some cases, tandem mass spectra corresponding to a peptide are acquired in some replicate analyses but not in others. Also, there will always be differences in the quality of the tandem mass spectra acquired causing variations in the scoring by the search engines. The peptide can, in most cases, still be detected and confirmed from its location in relation to neighbouring peptides in the intensity map.

Therefore, it is possible to achieve a very high reproducibility in proteomics experiments by visual inspection of the intensity maps, assuming the chromatographic separation is reproducible. It has been well established that images allow intuitive analysis and allow access to information that is otherwise not discernible by sequential examination of single spectra. The case of LC-MS is not an exception. Even though LC-MS image analysis is still in its infancy, the potential and advantages can now be shown by using DeCyder MS.


    Conclusions
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Small intensity changes between replicate analyses of the same sample cause variation in the data-dependent acquisition of tandem mass spectra and in the quality of the tandem mass spectra acquired, leading to variation in which peptides are identified by database searching. It is possible to assess and increase the reproducibility of repeat analysis by using the detection, matching, and 2D-visualization of DeCyder MS.


    Acknowledgements
 
We thank Julie Howard, Tom Dunkley, and Kathryn Lilley, Cambridge Centre for Proteomics, University of Cambridge, UK, for supplying the Arabidopsis organelle-enriched fractions and for fruitful discussions.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
, . . . , –.Mass spectrometry-based proteomics. Nature (2003) 422:198–207.[CrossRef][Medline]

, , , , . . . , –.Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum. Bioinformatics (2004) 20:3575–3582.[Abstract/Free Full Text]

, , , , . . . .Detection of artifacts and peptide modifications in LC-MS data using novel visualization software. Rapid Communications in Mass Spectrometry (2006) (in press).

, , , et al. . . , –.An overview of Ensembl. Genome Research (2004) 14:925–928.[Abstract/Free Full Text]

, . . . , –.TANDEM: matching proteins with tandem mass spectra. Bioinformatics (2004) 20:1466–1467.[Abstract/Free Full Text]

, , , . . . , –.The use of isotope-coded affinity tags (ICAT) to study organelle proteomes in Arabidopsis thaliana. Biochemical Society Transactions (2004) 32:520–523.[CrossRef][Web of Science][Medline]

, , . . . , –.A statistical basis for testing the significance of mass spectrometric protein identification results. Analytical Chemistry (2000) 72:999–1005.[Medline]

, . . . , –.A model of random mass-matching and its use for automated significance testing in mass spectrometric proteome analysis. Proteomics (2002) 2:262–270.[CrossRef][Medline]

. . . , –.Identifying the proteome: software tools. Current Opinion in Biotechnology (2000) 11:391–395.[CrossRef][Web of Science][Medline]

, . . . , –.A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical Chemistry (2003) 75:768–774.[Medline]

, , , , , , , , , . . . , –.High-resolution peptide mapping of cerebrospinal fluid: a novel concept for diagnosis and research in central nervous system diseases. Journal of Chromatography B, Analytical Technology Biomedical Life Sciences (2002) 782:353–361.

, , , , , , . . . , –.A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Analytical Chemistry (2004) 76:3856–3860.[Medline]

, . . . , –.Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Molecular and Cell Proteomics (2005) 4:419–434.

, , , , . . . , –.Prediction of chromatographic retention and protein identification in liquid chromatography/mass spectrometry. Analytical Chemistry (2002) 74:5826–5830.[Medline]

, , , , , , . . . , –.Informatics platform for global proteomic profiling and biomarker discovery using liquid chromatography-tandem mass spectrometry. Molecular and Cell Proteomics (2004) 3:984–997.

, , , , , . . . , –.Peptidomics: the comprehensive analysis of peptides in complex biological mixtures. Combinatorial Chemistry and High Throughput Screening (2001) 4:207–217.

, , , , , . . . , –.A neuroproteomic approach to targeting neuropeptides in the brain. Proteomics (2002) 2:447–454.[CrossRef][Web of Science][Medline]

, , , . . . , –.Peptidomics-based discovery of novel neuropeptides. Journal of Proteome Research (2003) 2:213–219.[CrossRef][Web of Science][Medline]

, , , , , , , , , . . . , –.Expression profiling of breast cancer cells by differential peptide display. Breast Cancer Research Treatment (2003) 79:83–93.[CrossRef][Web of Science][Medline]

, . . . , –.Impact of ion trap tandem mass spectra variability on the identification of peptides. Analytical Chemistry (2004) 76:2928–2937.[Medline]

, , , , , , , , , . . . , –.Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry (2003) 75:4818–4826.[Medline]

, , , . . . , –.Differential mass spectrometry: a label-free LC-MS method for finding significant differences in complex peptide and protein mixtures. Analytical Chemistry (2004) 76:6085–6096.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
57/7/1509    most recent
erj139v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (8)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Berg, M.
Right arrow Articles by Björkesten, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Berg, M.
Right arrow Articles by Björkesten, L.
Agricola
Right arrow Articles by Berg, M.
Right arrow Articles by Björkesten, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?