Mass spectrometry (MS)-based proteomics facilitates the identification of a large number of proteins in a single experiment1–3. As a result this technique has been established as a powerful complement to the classic tools of protein chemistry, such as western-blotting or enzyme-linked immunosorbent (ELISA) assays, which are of considerably lower throughput and specificity. Where initial proteomic workflows mainly aimed to identify proteins, quantification has become a major focus of much of technology development in recent years4,5. On a quantitative liquid chromatography/mass spectrometry (LC-MS) platform the amount of analyte and the corresponding chromatographic peak area are in linear correlation, hence concentration values are obtained through comparison with reference standards6. A technically powerful approach for protein quantification involves the use of isotope-labelled standards that show a similar structure and chromatographic behaviour to the target molecule, but are distinguished from the target by mass7. When added at an early stage of the quantification workflow, they allow for correction of analyte loss during sample preparation and analysis, hence rendering the quantification experiment robust. However, the requirement for isotope-labelled standards makes proteomics workflows expensive and reduces flexibility, as their production is laborious and applicable only to samples for which these standards can be obtained or generated (please see the Discussion). Moreover, as both the analyte and standard need to be measured, they double the analyte load for the mass spectrometer. Consequently, recent developments that have enabled label-free peptide and protein quantification have attracted much attention8–12. In a label-free experiment, quantification is achieved through comparison of peak areas obtained for an analyte under two or more biological conditions; for instance to compare a wild-type versus a mutant, a compound-exposed versus a control condition, or a biological time series13–16. Upon normalisation, ideally to one or more unaffected internal standards, this approach yields a relative expression value for the target protein. This measure is then used to evaluate whether the expression of the target is altered between the conditions tested. In the case of high sequence coverage, absolute quantities may also be estimated, as peak intensities obtained for the best ionizing peptides correlate in approximation with their absolute concentration10,12.
The absence of an internal standard spiked early in sample preparation protocols means that label-free methods are sensitive to technical variance, and consequently, label-free proteomics requires high instrument performance and standardization of sample preparation methods. In terms of instrumentation, limitations arise from the linear range of the mass spectrometer and the sample capacity of the liquid chromatography. Moreover, in untargeted proteomics, the stochastic nature of data-dependent acquisition methods, where ions are selected for analysis based on their intensity, reduces the number of quantifiable peptides to only those fragmented in all samples17,18. This problem is a consequence of the high numbers of co-eluting peptides that may considerably exceed the mass spectrometer’s sampling speed when analysing full proteomes, a situation that is amplified by the high number of replicates used in a label-free study. By facilitating data-independent acquisition, where all ions are fragmented irrespective of their intensity, recent studies have demonstrated the possibility of circumventing the need of isolating individual peptides11,17. One such method, pioneered by the Waters Corporation, is termed MSE,11. In this approach fragment ions are assumed to have the same elution profiles as their precursors; this similarity is then used to pair fragments and precursors when a number of parent ions are co-fragmented. Fragment pairs and their corresponding precursor ions are typically retrospectively paired prior to database searching11. More recently, in a workflow termed SWATH, a mass range relevant for peptide-based proteomics (400–1200 m/z) is scanned in 25 m/z windows, in which all ions that fall into that window are simultaneously fragmented (MS/MSall). Quantification is then conducted based on the peak areas of extracted ion chromatograms (XIC), which are computationally reconstituted from the merged spectra on the basis of both experimental and in silico generated spectral information17.
Sample preparation techniques are equally important for the performance of a label-free experiment, and easier to optimize on a daily basis than the mass spectrometer’s properties. The main objective for a label-free sample preparation method is to obtain stable peak intensities between replicate sample preparations. Consequently, the ideal workflow avoids processing steps that are prone to stochastic analyte losses, and the LC-MS set up is operated in a way that ensures the dynamic range of the instrument is not exhausted. These objectives may differ to classic shotgun proteomics, where the number of identifiable peptides and proteins is the most important value, and a higher variation in signal intensities is acceptable. Hence, a sample preparation method and LC-MS/MS configuration, which is ideal for identifying a maximum of proteins, may be sub-optimal for label-free quantification, and vice versa. For instance, pre-fractionation of the sample prior to the LC-MS/MS analysis, a popular strategy to improve peptide identification, adds another level of complexity to the sample preparation increasing the signal variability and thus, is avoided wherever possible.
The main objective of the study presented here is to benchmark proteomic sample preparation methods for their suitability in label-free proteomic studies. We compare popular sample protocols that are based on in gel19, filter-aided20,21 and in solution9,22 digestion procedures. Processing identical proteome samples obtained from budding yeast, and acquiring proteomic data without further prefractionation on two LC-MS/MS platforms, these methods are compared by their performance in sample preparation, their precision in label-free quantification experiments and their effectiveness in terms of time and reagents. Through the analysis of these samples on a 5600 QqTOF23 instrument operating in either a data-dependent mode or SWATH24 mode, this study concludes with an evaluation of data-dependent and data independent acquisition, and suggestions about the optimal protocol selection.
For sample preparation the following reagents were used: Water ULC-MS grade (Greyhound Cat. No. 23214125), formic acid 99% ULC-MS (Greyhound Cat. No. BIO-06914131) and acetonitrile ULC-MS grade (Greyhound Cat. No. Bio-012041-2.5L). Chemicals were obtained from Sigma, with the exception of RapiGest (Waters, Cat. No. 186001861), trypsin (Promega Cat. No. V5111), LysC (Promega, Cat. No. No. V1071), complete EDTA-free protease inhibitor cocktail tablets (used in the eFASP protocol) (Roche Cat. No. 11873580001), dithiothreitol (Melford Cat No. MB1015), ammonium bicarbonate (Fluka Cat. No. 40867-50G-F), sodium dodecyl sulfate (SDS, Melford Cat. No. S1030), 30% acrylamide/0.8% bis-acrylamide (Protogel, Geneflow Limited Cat. No. EC-890), tri-n-butylphosphate (Fluka Cat. No. 90820-100ML) and BCA Protein assay kit (Pierce Cat. No. 23225).
Preparation of yeast cells
All experiments were conducted using a single culture derived from a single colony of the yeast strain BY474125. The strain was transferred to yeast peptone dextrose (YPD) media prepared as described in26 and incubated at 30°C at 200 rpm overnight (ON). Subsequently the ON culture was diluted to an optical density (OD600) of 0.2 as measured on an Ultropsec 2000 (Amersham) spectrophotometer, and incubated at 30°C until reaching OD600 = 2. The culture was split into aliquots corresponding to 10 OD600 units, and stored at -80°C until processing.
Protein sample preparation for DDA and SWATH analysis
A detailed protocol for each of the six procedures is available in the Supplementary Materials (found at the end of the document in the offline version) (see Supplementary protocol 1–Supplementary protocol 6). In brief, protein samples were prepared from 30 mg (wet weight) of yeast pellet. For the in gel digest protocols, protein extraction was performed either in 200 µl SDT buffer (4% SDS, 100 mM Tris/HCl pH 7.6, 0.1 M dithiothreitol) or 0.05 M ammonium bicarbonate using a Fast-Prep 24 instrument (MP Biomedicals). Fifty µg of protein was applied onto a denaturing polyacrylamide gel and subjected to electrophoresis (for details please see Supplementary protocol 1 and Supplementary protocol 2). The sample was excised as single band, cut in pieces, and subjected to tryptic digestion27. For the filter-aided protocols (FASP, Supplementary protocol 3 and Supplementary protocol 4) protein extraction was performed either in 200 µl SDT buffer (4% SDS, 100 mM Tris/HCl pH 7.6, 0.1 M dithiothreitol) (FASP, Supplementary protocol 3) or lysis buffer (1% SDS, 10 mM Tris/HCl pH 7.4, 0.15 M NaCl, 1 mM EDTA in PBS) (eFASP, Supplementary protocol 4). For both protocols the digestion was performed directly on filters (Amicon Ultra-0.5 Centrifugal Filter Unit with Ultracel-3 membrane, Millipore). The FASP procedure (Supplementary protocol 3) involved a treatment with endoproteinase Lys-C (Promega) prior to digestion with trypsin20, while the eFASP Supplementary protocol 4) required protein precipitation using tri-n-butylphosphate/acetone/methanol mix (1:12:1) for lipid removal before digestion21. For in solution digest protocols Supplementary protocol 5 and Supplementary protocol 6) protein extraction was performed either in 200 µl lysis buffer (0.1 M NaOH, 0.05 M EDTA, 2% SDS, 2% β-mercaptoethanol) (RapiGest) or 0.05 M ammonium bicarbonate (RapidACN)20 or using glass-bead lyses using the Fast-Prep 24 instrument (MP Biomedicals), respectively. The in solution digest protocol based on the detergent RapiGest method included a step of protein precipitation for lipid removal through centrifugation prior to trypsin treatment. For the in solution acetonitrile-based digestion protocol, a clean-up step using 3 kDa molecular cut off filters (Amicon Ultra-0.5 Centrifugal Filter Unit with Ultracel-3 membrane, Millipore) was performed immediately after trypsin digestion9. In order to maximize the proteome depth for the generation of a SWATH ion library, tryptic digests prepared with the RapidACN protocol were separated by high pH reverse phase chromatography before LC-MS/MS analysis. A reverse phase column (Waters, BEH C18, 2.1 × 150 mm, 1.7 µm) was utilised in combination with a 20 mM ammonium formate to 20 mM ammonium formate/80% ACN gradient. Twenty fractions were collected and supplemented with HRM standard peptide kit (Biognosys) prior to analysis.
LC-MS/MS analysis of digested S. cerevisiae lysates was performed on a Tandem Quadrupole Time-of-Flight mass spectrometer (AB/Sciex TripleTOF5600) coupled to a Nanospray III Ion Source (AB/Sciex) and nano-HPLC (Eksigent Ultra 2D) (referred as TripleTOF platform), or hybrid quadrupole orbitrap mass spectrometer (QExactive, Thermo Scientific) coupled to a Dionex Ultimate 3000 and an Easy-spray nanospray ion source (referred as QExactive platform).
On the TripleTOF platform, peptide separation was carried out by first removing impurities on a pre-column (C18 PepMap100 column NAN75-15-03-C18-PM, Thermo Fisher Scientific Cat. No. 160321) running isocratically at 100% solvent A at a flow rate of 5 μL min-1 for 6 min. Peptides were then eluted onto the analytical column (Zorbax 300SB-C18 column, 75 µm id × 15 cm 3.5 µm, Agilent Technologies Cat. No. 5065-9911), and separated on a linear gradient of 5–35% solvent B for 155 min at a flow rate of 300 nL min-1. Peptides were injected into the mass spectrometer using 10 µm SilicaTip electrospray emitters (New Objective Cat. No. FS360-20-10-N-20-C12), and operating the ion source with the following parameters: ISVF = 2500; GS1 = 12; CUR = 25. The data acquisition mode in the DDA experiments was set to obtain a high resolution TOF-MS scan over a mass range 400–1250 m/z, followed by MS/MS scans of 20 ion candidates per cycle with dynamic background subtraction, operating the instrument in high sensitivity mode. The selection criteria for the parent ions included the intensity, where ions had to be greater than 150 cps, with a charge state between 2 and 4. The dynamic exclusion duration was set for 15 s. Collision-induced dissociation was triggered by rolling collision energy (Supplementary Table 1). The ion accumulation time was set to 250 ms (MS) and to 100 ms (MS/MS). For SWATH MS-based experiments the instrument was tuned to specifically allow a quadrupole resolution of 25 Da/mass selection. An isolation width of 25 Da was set in a looped mode over the full mass range (400–1250 m/z) scan and 32 overlapping windows were constructed28. An accumulation time of 100 ms was set for each fragment ion resulting in a total ion cycle of 3.3 s.
For LC-MS/MS analysis using the QExactive platform, separation of peptides was performed at a flow rate of 300 nL min-1 using a reverse-phase nano column (Easy-spray, Thermo Scientific PepMap C18, 2 µm particle size, 100 Å pore size, 75 µm i.d. × 50 cm length). Peptides were loaded onto a pre-column (Thermo Scientific PepMap 100 C18, 5 µm particle size, 100 Å pore size, 300 µm i.d. × 5 mm length) from the Ultimate 3000 autosampler (Dionex) with 0.1% formic acid for 3 minutes at a flow rate of 10 µL min-1. Polar impurities were removed by running the system isocratically at 100% Å at a flow rate of 5 μl min-1 for 6 min. Finally, tryptic peptides were loaded onto the analytical column and separated using a linear acetonitrile gradient of 5–35% B for 155 min at a flow rate of 300 nL min-1. The LC eluant was injected into the mass spectrometer by means of an Easy-spray source (Thermo Fisher Scientific). All m/z values of eluting ions were measured in an Orbitrap mass analyzer, set at a resolution of 70000. Data dependent scans were employed to automatically isolate the 20 most abundant ions and generate fragment ions by higher energy collisional dissociation (HCD) in the quadrupole mass analyser. Only peptide ions with charge states of 2+ and above were selected for fragmentation. Finally, the measurement of the resulting fragment ions was performed in the Orbitrap analyser, set at a resolution of 17500.
Data acquired in DDA mode was analysed by means of either the Paragon29 (ProteinPilot software, AB/Sciex, v. 22.214.171.124, 1654) or the Mascot search algorithm (Matrix Science, version 2.3.02) using the S. cerevisiae S288C translated ORF database (based on SGD genome version R64-1-130). 156 common contaminant ions (AB/Sciex) were excluded from subsequent analysis. For Paragon searches, we used the following settings: Sample type: Identification; Cys Alkylation: Iodoacetamide; Digestion: Trypsin; Instrument: TripleTOF5600; Special Factors: none; Species: S. cerevisiae; Search effort: Thorough ID; Results Quality: 0.05. Only peptides with a confidence score of > 0.05 were considered for further analysis. For Mascot searches, the data was pre-processed using PeakView (AB/Sciex, Triple TOF v. 126.96.36.199) or Protein Discoverer (Thermo Scientific, v. 1.3) setting carbamidomethylation of cystein (C) as a fixed modification, oxidation of methionine (M) as variable modification and allowing a maximum of 2 missed cleavages. Fragment mass tolerance was set to 0.8 Da, and Instrument type was ESI-TRAP. Peptides under a significance threshold of 0.05 were regarded as acceptable.
For the extraction of data acquired in SWATH mode, an ion library for yeast was generated from data acquired in data dependent mode. Briefly, a whole proteome yeast digest obtained with the RapidACN procedure was subjected to high pH reverse phase chromatography (Waters, BEH C18, 2.1 × 150 mm, 1.7 mm) and 20 fractions were collected across a linear gradient of 0–56% ACN in 20 mM ammonium formate for 74 min. Solvents were removed by vacuum centrifugation and peptides were resuspended in 5% ACN/0.1% formic acid. Yeast tryptic peptides were then supplemented with 1× iRT standard peptides (Biognosys) and analysed on the TripleTOF platform. Spectral data were acquired in DDA mode and analysed using the Paragon search strategy as described above. Detected peptides were then corrected for retention time shifts, and the corresponding spectra were combined leading to a library containing 2800 unique yeast proteins. For extraction of SWATH data and peptide quantification Spectronaut 3 (Biognosys) and Skyline31 was used. In parallel, Skyline was also used for quantification of peptides from data dependent acquisition experiments. Subsequent data analysis was performed with R, ggplot2 package and custom-built scripts. GO analysis was based on the SGD Gene Ontology Slim Mapper.
Protocol selection and overall assessment
For this comparative study we selected an in gel digest method adapted from19, conducted in combination with an SDS-based and native protein extraction, two filter-aided (FASP (Filter Aided Sample Preparation) adapted from32 and a recent enhancement termed eFASP, (adapted from33), and two in solution procedures (RapiGest, adapted from22, and RapidACN adapted from16). There characteristics, as detailed below, are summarized in Figure 1. All procedures are given in lab-protocol format as Supplementary protocol 1 to Supplementary protocol 6.
Figure 1. Characteristics of label-free sample preparation methods.
Left panel: Schematic overview of the different steps in an LC-Ms/MS sample preparation method Right panel: Main characteristics of the protocols compared in this study. Detailled protocols are given in the Supplementary material. Supplementary protocol 1: In gel/SDS; 2: In gel/ABC; 3: FASP; 4: eFASP; 5: RapiGest; 6: RapidACN.
In gel digestions
In gel digestions are popular sample preparation methods as they are convenient, and offer a simple way of protein pre-fractionation through gel slicing and removal of small or high-molecular contaminants that could interfere with trypsin digestion. These approaches are compatible with multiple sample extraction buffers, can easily be combined with gel staining that does not interfere with protein digestion34–36, and thus provide a visual quality control over the protein sample. However, casting and running the gels render these protocols time consuming; hence the protocols are of relatively low throughput. In this study, we benchmarked in gel digestion in combination with both SDS-containing (Supplementary protocol 1) and SDS-free protein extraction Supplementary protocol 2) (In gel/SDS and In gel/ABC, respectively19, Table 1). SDS-PAGE was however not used as a tool for pre-fractionation. In order to compare in gel methods with filter-aided and in solution digestion, the full mass range was processed and measured at once.
Filter-aided sample preparation
The second set of assessed protocols involves digestion on filter units. These protocols are popular due to their flexibility and due to the fact that they facilitate a simple handling and require only a modest hands-on time (~3 hrs). The first protocol tested, FASP32 involves a dual protease digest (LysC and trypsin), while the second filter-aided procedure (here called eFASP) is a stepwise-optimized version of FASP by Shevchenko and colleagues21 that involves protein precipitation.
In solution digestions
The final two protocols tested in this study perform protein digestion in solution. The first protocol is based on the proprietary, acid degradable detergent RapiGest (Waters37), included in a protocol derived from Von der Haar et al.22. This protocol involves protein precipitation, which renders the RapiGest procedure more laborious as compared to the second in solution protocol, termed RapidACN. This rather simple method is based upon a tryptic digest in acetonitrile that is combined with a filter-based sample cleanup9. The RapidACN method requires the least number of handling steps and lowest hands-on time (~2 hrs per sample), overall facilitating the highest throughput among the tested procedures.
Protein identification and compartment specificity
The six protocols, provided as detailed protocols in the Supplementary materials, were used to process an identical, full proteome sample of Saccharomyces cerevisiae. This single cellular eukaryote possesses a proteome of medium complexity (6,000–7,000 protein coding genes38) and has served as a reference organism in many landmark proteome studies29,39–41. Here, the use of yeast facilitated sampling from a single culture, bypassing the possibility of biological variability occurring between the samples analyzed. However, once proteins are extracted, these protocols are fully applicable to process samples obtained from other species as well. To process the yeast pellets, the protocols were executed as closely as possible to their original recipes (with unavoidable minor deviations highlighted in the Protocol section), both in full triplicates (= protocol triplicates), and in injection replicates for comparing the acquisition methods (= injection triplicates). Samples were analysed on a hybrid quadrupole time of flight (TripleTOF5600, AB/Sciex) mass spectrometer for DDA and SWATH acquisition, or on a hybrid quadrupole orbitrap mass spectrometer (QExactive, Thermo Scientific) for DDA acquisition. DDA database searches were conducted using Mascot (for TripleTOF5600 and QExactive, Matrix Science,42) or ProteinPilot43 (for TripleTOF5600, AB/Sciex), whilst SWATH data was processed with Skyline31 and Spectronaut44 (Biognosys) software.
It is noteworthy that in this study the analytical setup was adapted for quantification and not to maximize the number of protein identifications. This involved the injection of low amounts of sample (equalling 1 µg digest per protocol) to prevent column overload and largely overrunning of the dynamic range. Moreover, to allow a direct comparison of the protocols, data was recorded in single injections and samples were not pre-fractionated. This strategy yielded highly reproducible quantification results, achieving up to < 5% coefficient of variance (CV) values in label-free replicate injections for some protocols, as shown below.
As an indicator of the quality of tryptic digests, we first assessed the relative occurrence of partially cleaved peptides in data obtained from triplicate injections on the TripleTOF platform. All filter-aided and in solution protocols yielded reasonable digestion efficiencies as revealed by an analysis with both Paragon (AB/Sciex, Figure 2a) and Mascot (Matrixscience, data not shown) search engines. Both in solution and the eFASP procedure yielded arginine- and lysine cleavages in a similar ratio as found in the yeast proteome, with the lowest number of spectra assignable to missed cleavage tryptic sites found in the RapiGest dataset (Figure 2a, and Figure 2b). In the fourth protocol, FASP, however, we found lysine cleavages overrepresented compared to arginine cleavages (Figure 2b). This indicates that the presence of LysC in this protocol increased the overall digestion efficiency of lysine residues; however this may introduce a bias in (absolute) quantification experiments, overrating lysine over arginine peptides in quantification. With the employed in gel protocols we obtained a significant higher number of spectra that corresponded to uncleaved peptides (Figure 2a). Incomplete cleavage of peptides can render a sample preparation unsuitable for absolute, but also for relative quantification, as the rate of cleavage may not be reproducible between replicates9. For this reason, we consider the in gel protocol as employed (without prefractionation on the whole-proteome sample) to be potentially erroneous in protein quantification and identification, and excluded the data from the assessment of protein quantification quality. This result however does not exclude the possibility that on other samples, or with modified in gel protocols, acceptable cleavage efficiencies are achieved, and thus, this result should not be interpreted as a critique of in gel methods in general.
Figure 2. Protein identification in label-free sample preparations.
(a) Proteolytic digestion efficiencies. Trypsin or LysC/trypsin (FASP) digestion efficiencies expressed as relative occurrence of spectra that could be assigned miscleaved peptides (n = 3). (b) Amino acid specificity of proteolytic digestion. Relative occurrence of identified peptides with C-terminal lysine or arginine, compared to the average frequency of these amino acids across all individual proteins identified (n = 3, Error bars = +/- S.D.) (c) Identified peptides differ per protocol, and correlate with the total peak area as recorded in a DDA experiment. 18 samples derived from the same yeast culture were processed with six protocols in triplicates, and analyzed on a TripleTOF5600 instrument. The number of identified peptides correlates with the total peak area recorded, and indicates the highest identification rate in solution digests, followed by filter-aided, and in gel procedures. (d) Detection of proteins by DDA or SWATH in a label-free experiment. Samples were analyzed in triplicates both for DDA and SWATH acquisition on a TripleTOF5600 instrument, data was searched using paragon (DDA), and Spectronaut (SWATH). SWATH increased the number of detectable proteins in combination with the in solution protocols. In solution protocols RapidACN and RapiGest led to the detection of up to 1000 proteins in single injections, followed by FASP and eFASP, which gave rise to between 250 and 750 proteins, and in gel injections that yielded 300 proteins IDs. Inset: A comparison of protein IDs for the TripleTOF and QExactive platform shows a linear correlation for the protocols investigated. Data was searched using Mascot (n = 3, Error bars = +/- S.D.)).
The number of detected peptides correlated with the sum of recorded total peak area, confirming that the instrument was operating within its dynamic range (Figure 2c). The yield of detected peptides (Figure 2c) and proteins (Figure 2d) revealed different performance of the tested protocols. For both data dependent (DDA) and SWATH acquisition, the two in solution protocols (RapiGest and RapidACN) gave the highest number of detectable peptides and proteins. Filter-based FASP and eFASP protocols ranked in the middle range, whilst a significantly lower number of proteins were detected from the in gel digests. Of note, SDS-based compared to native protein extraction increased the number of membrane protein detections in the in gel procedure, but in total a higher number of peptides were obtained in the natively extracted samples. To exclude that these results were platform specific, we injected the same samples on a QExactive mass spectrometer, operating with a different HPLC system and column (Dionex Ultimate 3000; 2 µm particle size C18, 75 µm i.d. × 50 cm column, see methods section). However, the number of protein IDs obtained with the two platforms correlated linearly, indicating that the ID performance of the tested protocols is platform independent (Figure 2d, Inset). Additionally, we tested to what extent injecting higher amounts of sample or pre-fractionation would increase the number of identifiable proteins. Single injection of 10 times the RapidACN sample increased the number of identifiable proteins by 34% to 1550 (QExactive), while high-pH RP HPLC pre-fractionation of a RapidACN digest led to the identification of 2800 proteins (TripleTOF). Similar tendencies were observed with the other protocols as well, indicating that when combined with sample pre-fractionation, all protocols and both platforms are suitable ID-optimized experiments, as addressed in other studies.
To be able to compare data dependent (DDA) and data independent (DIA) acquisition in terms of protein detection, we then analysed the samples using SWATH mode. Overall, when setting the highest quality threshold on SWATH-detected peptides (Spectronaut Q value < 0.01), SWATH and DDA detected a comparable number of proteins for the in gel and FASP procedures. However, SWATH outperformed DDA in the samples with high peptide content, RapiGest and RapidACN, leading to a modest but consistent increase in protein detection numbers (Figure 2d).
Performance of sample preparation methods in covering the variety of the proteome
Next we used the TripleTOF/DDA data to assess whether the protocols covered a similar set of proteins. A subset of 368 proteins overplayed between protocols 3, 4, 5 & 6 (all filter-aided and in solution protocols), while the filter-aided protocols (Supplementary protocol 3 and Supplementary protocol 4) overlapped for 479 proteins, and the in solution protocols (Supplementary protocol 5 and Supplementary protocol 6) for 915 proteins (Figure 3a). Due to high occurrence of uncleaved peptides, which may affect protein identification, the in gel methods are omitted from this illustration. However, the proteins identified in the in gel samples were to > 95% covered by in gel and in solution methods as well (data not shown.) All other protocols however also covered specific sets of proteins. RapiGest yielded the highest absolute number of unique IDs, while eFASP provided the highest percentage. Hence, in targeted proteome studies, sample preparation with different protocols might be considered in order to increase the probability of quantifying the desired target.
Figure 3. Protocols cover cellular compartments differently.
(a) RapiGest and eFASP cover a unique space in the proteome. Identified proteins were visualised in a Venn diagram, excluding the in gel protocols. The RapiGest procedure yielded most unique IDs, followed by eFASP and RapidACN (n = 3). (b) SDS-containing protocols are best suited for the extraction of membrane proteins. For the analysis of annotated functions in each protocol, selected GO terms were expressed as percentages of identified proteins. While cytosolic proteins were not enriched in any protocol, membrane proteins were preferentially detected in the SDS-containing protocols. (n = 3, Error bars = +/- S.D.) (c) Filter-aided sample preparations yield a balanced representation of the proteome. The identified proteins were plotted against the percentage of proteins annotated by the GO term cytosol, in order to illustrate the similarity of extraction properties. The protocol properties required for efficient extraction of membrane and nuclear proteins is inversely correlated with the extraction efficiency for cytosolic proteins, while there is a positive correlation with ribosomal proteins.
We next assessed whether these differences correlated to the coverage of cellular localisations. The tested protocols gave high coverage of the GO term cytosol, and performed equally on the mitochondrial proteome (Figure 3b, see Supplementary Figure S2 for a complete overview of GO terms). However, different results were obtained for membrane proteins. The lowest relative content of membrane proteins was obtained for those protocols that extract proteins under non-denaturing conditions, namely RapidACN and in gel/ABC. Conversely, most membrane proteins were detected in the detergent-rich protocols, eFASP and RapiGest. Overall, FASP and eFASP yielded the most balanced representation of both the membrane and cytosolic fraction, while RapidACN data exhibited the strongest bias towards cytosolic and against membrane proteins (Figure 2c).
Finally, we tested whether the protocols covered the proteomic mass range and charge state equally. The proteomic mass range was similarly represented by all protocols with a slight positive bias towards large proteins in all protocols (Supplementary Figure 1a). The procedures, however, differed in the representation of proteins with a certain isoelectric point (pI). The best representation of the proteome pI distribution was obtained with RapiGest (deviation coefficient (d) = 2.4), followed by FASP (d = 2.8) and RapidACN (d = 2.9) (Supplementary Figure 1b). In gel procedures scored least as they were negatively biased towards neutral proteins, and achieved a lower d value of 5.3 or 5.9 for in gel/ABC or in gel/SDS, respectively.
Next, we compared the protocols for their consistency in label-free quantification. As illustrated in Figure 2c, the number of identified peptides correlated with the sum of total peak area recorded, hence all procedures in principle lead to quantitative results. To be able to compare the protocols, we expressed the variation of signal intensities obtained from replicate sample preparations as coefficient of variation (CVs), and we plotted the frequency of CVs in two-dimensional distribution histograms (‘violin plots’, Figure 4a). DDA acquisition resulted in a CV maximal likelihood of 20% for eFASP, FASP and RapiGest. Although most peptides showed a variation of this magnitude, it is worth noting that there was a considerable spread of CVs in all three protocols, with some peptides showing as much as 140% variation. By far the highest signal reproducibility with a CV maximal likelihood of 7% was obtained with the RapidACN protocol (Figure 4a), indicating best suitability of this protocol in label-free quantification.
Figure 4. In solution digestion leads to stable results in label-free proteomics.
(a) The distribution maximum of coefficients of variation (CV) of the selected protocols varies between 0.075 and 0.2. CV values obtained for protocol triplicates are shown as two-dimensional distribution histograms (‘violin plots’). Quantification in DDA experiments were consistent over the dynamic range, as CV values only marginally changed when filtering by peptides according to their abundance (80%, 60%, 40% or 20%). CV likelihood maxima of all protocols were below 20%, while RapidACN lead to the most reproducible results (CV = 7%) (n = 3). (b) Stability in a quantification experiment is improved by data-independent acquisition. CV values for the same set of peptides measured with SWATH and DDA using the RapiGest protocol, as shown in a two-dimensional distribution histogram. Whereas there was a high signal variation in DDA acquisition, the variation could be largely reduced in SWATH acquisition. (c) In solution protocols yield the highest number of peptides suitable for label-free quantification. The number of peptides with a CV < 0.15 as determined in DDA and SWATH acquisitions. The number of highly reproducible peptides was lower for FASP and eFASP, and SWATH acquisition did not improve the performance in combination with these methods. In solution protocols on the other hand did yield a maximum of about 2500 high-quality peptides using SWATH.
Table 1. Sample preparation methods and their performance.
Summary of the main characteristics of the sample preparation methods investigated.
|Protocol Name||In gel/SDS||In gel/ABC||FASP||eFASP||RapiGest||RapidACN|
Kaiser et al.,
Kaiser et al.,
et al., 2009
et al., 2012
based on von der
Haar et al., 2007
et al., 2011
||4% SDS, DTT||0.05 M ABC||4% SDS, DTT||1% SDS,|
|2% SDS, β-ME,|
|0.05 M ABC|
Use of ﬁlter|
|-||-||for digest||for digest||-||for sample|
||Trypsin||Trypsin||Lys-C + Trypsin||Trypsin||Trypsin||Trypsin|
||29 hrs||29 hrs||32 hrs||30 hrs||30.5 hrs||26 hrs|
|(16 hrs)||(16 hrs)||(20 hrs)||(16 hrs)||(18 hrs)||(18 hrs)|
Hands on time
||5 hrs||5 hrs||3 hrs||3 hrs||3.5 hrs||2 hrs|
|<0.5||<0.5||10.75||1.8||1 (=2 £)||1.95|
CV in label-free|
||+ Cost effective|
|+ Cost effective|
|+ Best pI|
|+ Lipid removal||+ High ID|
+ Low CVs
+ Good coverage
+ Lowest CVs
+ Highest No of
|- Low ID|
|- Low ID|
|- Dual protease|
|- Traces of nOGP|
- High CV
|- Dependent on|
- Traces of
|- Low number of|
Next, we counted the number of precisely quantified peptides, defined as peptides with a CV < 15%. Also in this measure, the RapidACN procedure outperformed the other methods, while RapiGest, and eFASP performed second and third best, respectively (Figure 4c). Not covered in this benchmark is the performance of the individual protocols in repeated sample preparation over longer periods, i.e. weeks to months. This might be required for particular sets of samples that can not be stored without a protease digest, yet require sampling on different days to address a specific biological question.
Finally, we tested whether SWATH analysis improved label-free quantification. Comparing the CV distribution of peptides detected both in DDA and SWATH data using the RapiGest protocol (Figure 4b), we discovered a much more focussed CV distribution around a maximal likelihood of 5% in SWATH, compared to a maximal likelihood of 20% in DDA mode. When counting the number of precisely quantified peptides (CV < 0.15), SWATH led to an increase of up to a factor of two and five for RapidACN and RapiGest, respectively (Figure 4c). Hence, SWATH acquisition greatly improved the CV stability with label-free acquisition, the result of which is that a substantial number of peptides were precisely quantified.
Stable isotope labelling is a popular and reliable strategy in quantitative proteomics, yet it has limitations that arise from an increased analyte load in the precursor ion (MS1) space, and the way standards are produced or incorporated: For instance, targeted protein quantification using AQUA peptides45, achieves absolute quantification though comparison between the peak areas of light and chemically synthesized heavy-isotope labelled peptides of known concentration. However the costs for such peptides limits the number of proteins quantifiable7,45. An alternative strategy is the non-targeted chemical labelling of proteins and peptides with isobaric tags (i.e. iTRAQ, TMT), facilitating multiplexing of proteome samples and providing relative simultaneous quantification of labelled peptides8,46. However, frequent co-selection of the reporter ions reduces both the accuracy and precision of quantification47,48 Such a problem is circumvented when metabolic incorporation of isotope-labelled amino acid residues (i.e. SILAC49, or recent extensions like instance NeuCODE which is based on different nuclear mass dependent on the isotope combination integrated50), is used to create isotope-labelled standards in vivo. However, this approach is limited to heterotrophic species that consume lysine and arginine from the culture medium, and is in practice limited to tissue culture as the attempt to introduce labelling in animal models becomes extremely expensive51.
Label-free experiments circumvent the use of isotope labelled standards, thus are not affected by the above-mentioned limitations. As such, they are ideal complements when isotope labelling becomes a limitation. However, they lack possibilities to correct for selective sample loss, and hence are more sensitive to variations in sample preparation and instrument performance. The protocols employed thus require more rigorous validation.
In gel digests
Our comparison starts with a classic in gel digestion method19, which is tested in combination with SDS-containing- and SDS-free protein extractions (Supplementary protocol 1 and Supplementary protocol 2). These popular cost-effective procedures are based on the principle that a protein sample is denatured and separated on an SDS-PAGE gel prior to reduction, alkylation and protease digestion that are conducted within the gel matrix. The gel fulfils the function of sample clean up, as it removes positively charged contaminants as well as large macromolecules (i.e. nucleic acids) and small chemical compounds, and is very robustly applied to a large variety of sample types. Furthermore, the excision of individual bands or mass ranges make in gel digestions attractive wherever a simple sample pre-fractionation is required. Proteome pre-fractionation in gel (geLC-MS) has resulted in a significant proteome depth and dynamic range in studies were > 5000 distinct proteins were confidently identified and quantified52,53. Moreover, in gel digests have proven ideal when gel bands resulting from individual proteins are to be identified (i.e. for studying protein complexes). In the present study however, we did not make use of sample pre-fractionation. In order to achieve comparability with the other protocols, the full mass range was processed for the digest (see Methods section, and Supplementary protocol 1 and Supplementary protocol 2). This treatment led to a full representation of the proteomic mass distribution (Supplementary Figure S1). Under these circumstances however, the classic in gel protocol applied proved the least suitable method for label-free quantification. The protocol was the most time consuming, yet yielded a significant number of miscleaved peptides, and we detected the lowest number of proteins and peptides in total. Differences between SDS-free and SDS-containing sample extraction concerned the relative content of membrane proteins identified, which was higher in the latter, whereas the native extraction resulted in a higher number of proteins identified in total. This result should however not be interpreted as a general critique on in gel methods, as in combination with protein pre-fractionation (gel-slicing), they have proven for well-suitable sample preparation methods in ID experiments52,53.
Filter-aided sample preparation
The dependence on filter units in the two tested filter-aided sample preparation procedures, FASP32, and one of its recent extensions (here called eFASP21), increases the material costs, but has advantages for sample handling and throughput. Indeed, handling of the first protocol, FASP, was efficient and achieved a reasonable throughput with modest hands-on time (Supplementary protocol 1). In protein identification, FASP achieved the highest relative amount of detected membrane proteins. Hence, this protocol might be an ideal choice when membrane proteins are to be studied.
FASP was the only protocol in this study where digestion was carried out using a combination of proteases, LysC and trypsin. Similar to previous reports54, we observed that the addition of LysC increased the relative digestion efficiency. However, this resulted in an over-representation of lysine over arginine containing peptides, which may lead to bias in cases where this protocol is used in an absolute quantification experiment. In label-free quantification, FASP performance was average both in the number of precisely quantified peptides and in the CV values obtained for replicative sample preparations. It is important to mention in this context that the performance of FASP procedures is dependent on the filter units that are available from different manufacturers, but exactly the same filter unit which was used in the original FASP paper32 is no longer available. In this study we have chosen Amicon Ultra-0.5 3k for both FASP based protocols as used in eFASP by Shevchenko et al.21, as their cut-off rate of 3 kDa is the closest to the addressable mass range of the SWATH acquisition (400–1200 m/z). Further work from Wisniewski et al. demonstrated that also larger cut-off rates up to 50k are suitable in combination with the FASP protocol, and can improve the identification rate of larger proteins and peptides55. Moreover, in difference to the other protocols tested in this study, the tryptic digest in FASP is conducted in a very high concentration of urea. A simple protocol adaptation to influence the tryptic digest could thus be to change the buffer conditions, i.e. to a buffer as used in eFASP21 (Supplementary protocol 4).
The second filter-aided protocol, eFASP, represents a stepwise optimisation of FASP, and contains several alterations compared to its predecessor21 (Supplementary protocol 4). The protease digest is performed using trypsin only, and the protocol includes a lipid removal step and uses n-octyl-D-glucopyranoside (nOGP) as the detergent in sample preparation. The latter might be regarded as an undesirable addition to the sample, as nOGP can interfere with electrospray ionisation. Indeed, despite all washing steps, we could detect traces of nOGP in the MS/MS spectra, and the collection of MS data was reduced at the time a nOGP sodium adduct eluted (data not shown). Despite this, the modifications made for eFASP clearly improved the performance in protein and peptide identification. However, in our hands, they did not improve the precision in label-free quantification, the performance of FASP and eFASP in this measure was comparable (Figure 4). Hence, the main advantage of eFASP over FASP lies in improvements in protein identification and proteome coverage.
In solution digestion
The first method tested (Supplementary protocol 5) is based upon the commercial reagent RapiGest (3-[(2-methyl-2-undecyl-1,3-dioxolan-4-yl)methoxy]-1-propanesulfonate37 (Waters)), an anionic detergent which is depleted from the sample through acidic cleavage. The established protocol22 contains a step for lipid removal and a precipitation step that renders this procedure more laborious compared to the FASP and RapidACN protocols. However, as it does not involve any filter unit, it was most economic in terms of material costs per sample if one disregards the in gel protocols. Moreover, it yielded the highest number of protein and peptide IDs, and it detected the highest absolute number of membrane proteins. In label-free quantification, it scored third best in the average CV for DDA, and second best in combination with SWATH acquisition. Expressed in absolute quantities, this method yielded the second-highest numbers of precisely quantified peptides. Thus, the RapiGest protocol is a versatile and economic method that may represent the optimal choice in many applications. The only inexplicable issue with this protocol was related to the inefficiency of RapiGest degradation and precipitation in a small subset of samples. Thus care must be taken to avoid its injection in the LC-MS/MS setup.
The second in solution protocol (termed RapidACN9, Supplementary protocol 6) is detergent-free and based on acetonitrile in sample processing and proteolytic cleavage followed by clearing samples from high-molecular weight contaminants by a final filtration step. As this protocol is based on a native protein extraction, it identified - in relative terms - the lowest number of membrane proteins. Moreover, as it does not contain an intensive pre-digest sample treatment, functionality of this protocol may omit tissue were such a forefront clean up is mandatory. Despite these limitations, RapidACN performed best in the metric most crucial for robust label-free quantification, a low CV value in replicate sample digests and injections. Moreover, compared to the other tested methods, RapidACN was simplest in handling, required the least processing steps and only minimal hands-on time (~2 hrs), while yielding the second highest number of protein and peptide detection both in DDA and SWATH acquisition methods. Hence, RapidACN might be the most suitable solution for a label-free experiment when the focus is not to quantify membrane proteins, or to analyze tissue that requires extensive clean up.
Data-dependent versus data-independent acquisition
We chose to perform major parts of this study on a TripleTOF5600 instrument (AB/Sciex), in order to compare data-dependent acquisition (DDA) with data-independent acquisition (DIA). DIA is believed to be advantageous for label-free quantification, as it is not affected by run to run variation, and as MS2 data is reconstructed in chromatograms that resemble selective reaction monitoring (SRM)17. Therefore, this technique appears a desirable choice for the label-free analysis of biological time series, that require many samples (replicates over many time-points) to be compared15. The design of the TripleTOF5600 quadrupole allows precursor ion selection in a rectangular rather than a Gaussian mass selection window as in other instruments, reducing the co-selection of peptides falling in the adjacent mass windows23. In a workflow termed SWATH, the mass range from 400 to 1200 m/z is scanned in 25 Da windows, and the merged data used to reconstruct spectral (MS2) m/z chromatograms17. Processing SWATH data with Spectronaut (V. 3.0.337, Biognosys), we compared the performance of DDA with SWATH in protein detection and label-free quantification. In samples with low peptide content, the number of detected proteins with DDA and SWATH was comparable. However, in the in solution protocols that led to highest IDs, SWATH acquisition gave a slight but significant advantage in terms of peptides detected. This indicates that this approach is advantageous in protein detection when coupled with complex matrices. Significantly improvement of SWATH versus DDA was however observed in label-free quantification. The strongest effect of SWATH acquisition was observed when it was used in conjunction with the RapiGest protocol (Supplementary protocol 5), where the number of precisely quantified peptides increased by a factor of five, followed by the combination with RapidACN (Supplementary protocol 6), where this measure doubled (Figure 4c). Of note, SWATH employed in combination with the latter, resulted in an average CV below 5%, representing a superior value obtained in a label-free experiment. These improvements mainly resulted from a more reliable quantification of peptides in the mid to high abundance range, whereas there was no increased improvement quantification of low abundant spectra over DDA. We assume that this difference could be further optimized by improving the SWATH peak selection algorithms, as noise in the low abundance window results from occasional misassignment of fragment ions to precursors.
By facilitating label-free quantification, second-generation proteomics techniques enable flexible proteomic workflows. As the protocols cover different sets of proteins, the main determinant to select the best suitable method and workflow remains the biological question and the set of proteins to be addressed. Despite this, sample preparation methods differ in precision, sensitivity and throughput. Under the conditions of this benchmark, and under the conditions in our laboratory, a combination of in solution digestion protocols RapiGest or RapidACN with SWATH acquisition yielded optimal results for a label-free proteomics experiment. Achieving reliable quantification at reasonable numbers of detected proteins, label-free quantitative proteomics represents a suitable alternative to isotope labelling in addressing a series of biological problems.