Mass spectrometry data of diabetic rat sperm proteome treated with Gynura procumbens aqueous extract [version 1; peer review: 1 approved, 1 approved with reservations]

Diabetes mellitus has a deleterious effect on the male reproductive system, especially on sperm quality and spermatogenesis. Gynura procumbens (G. procumbens) is a traditional herb known for its ability to improve the fertility of diabetes-induced male rats. This study was designed to identify the differential expression of sperm proteins after treatment with G. procumbens aqueous extract on diabetes-induced male rats. The sperm proteome was profiled using label-free shotgun proteomics analysis. Sprague Dawley rats used in this study were divided randomly into four groups. One group was a normal control group (healthy rats), while the three other groups were induced with 50 mg/kg  bodyweight (BW) of streptozotocin (STZ) to emulate the diabetic condition. The diabetic rats were divided into negative control (non-treated diabetic), metformin-treated (positive control) and G. procumbens aqueous extract-treated (450 mg/kg BW) groups. Oral treatments were administered for 14 consecutive days before the rats were euthanized. Total sperm protein samples were extracted from the caudal epididymis and run through SDS-PAGE. Later, samples were digested using trypsin before liquid chromatography-tandem mass spectrometry (Thermo Orbitrap Fusion) analysis. The acquired data were processed using MaxQuant and Perseus software. The mass spectrometry proteomics data is available through ProteomeXchange Consortium via the PRIDE partner repository, with the dataset identifier PXD011373.


Introduction
Diabetes mellitus is a metabolic disorder that is often associated with male infertility, causing disruption to spermatogenesis, testicular impairment, and erectile and ejaculation dysfunction 1 . Disruption to spermatogenesis and testicular impairment in diabetic rats has been reported to cause low sperm quality 2 . Current medicine, such as metformin and other oral antihyperglycaemic drugs, claimed to significantly reduced blood glucose level; however, they could not improve the fertility problems associated with diabetes 3 . Medicinal plants or herbs have been long used in traditional practice and can be considered as alternative treatments for some diseases. Gynura procumbens is one of the herbs that has shown potential for treating various diseases such as diabetes mellitus, cancer and, recently, fertility. A previous study demonstrated that G. procumbens had potential as anti-hyperglycaemic agent by lowering the blood glucose level of diabetes-induced male rats 4 . It is claimed that G. procumbens imitates the mechanism of action of insulin, increasing the glucose uptake into the skeletal muscle 5 . Additionally, G. procumbens extract has the ability to improve the fertility of diabetes-induced male rats 3,6 . To further elucidate the effect of G. procumbens aqueous extract (GPAE) on sperm quality, a proteomic analysis was performed to identify and quantify the differential expression of total sperm proteins after treatment with G. procumbens extract. . All efforts were made to ameliorate harm to animals, achieved by inducing to imitate diabetes conditions with streptozotocin (STZ) intravenously at a dose of 50 mg/kg bodyweight (BW) in all groups except the control normal group, which was left uninduced.

Ethical statement
Source and husbandry of animals A total of 28 male Sprague Dawley rats aged eight weeks (120-200 g) were used in this study. All rats were proven fertile and provided by the Animal House of Universiti Kebangsaan Malaysia. All animals were acclimatized for seven days, wherein their food pellet (Rat Chow, Barastoc, Ridley, Australia) and drink intakes were given ad libitum. The rats were kept in PVC cages at controlled room temperature with a 12-hour light/12-hour dark cycle.
Experimental design G. procumbens were harvested from Universiti Kebangsaan Malaysia glass house (2.9300°N, 101.7774°E). The G. procumbens aqueous extract was prepared as described previously 3 . Briefly, the leaves part of G. procumbens were harvested and dried in the oven for 72 hours at 48°C. The dried leaves then were ground to dust and extracted for three hours at 60°C. The extract produced was in liquid form, later sent for freeze-drying and kept in 4°C for freshness.
Experimental methods and procedures were detailed as previously 7 . Briefly, male Sprague-Dawley rats were used in this study, which were randomly hand-picked and divided into four groups, with seven rats in each group. Randomization was performed in choosing the rats to avoid bias in the experiment. One group served as a normal untreated group (N), while the three other groups were induced with 50 mg/kg BW STZ intravenously for type 1 diabetes induction. The blood glucose level of each rats was measured after 72 hours of STZ induction. Rats with blood glucose level at 13 mmol/L and more were considered as diabetic and used in this study. These diabetic rats served as (i) diabetic-untreated (negative control), (ii) diabetic-metformin treated (positive control) (500 mg/kg BW) and (iii) diabetic-GPAE treated (450 mg/kg BW) groups. Treatment was given via oral gavage for 14 consecutive days after seven days of STZ induction at the Animal House, Universiti Kebangsaan Malaysia. On the 15 th day, all rats were euthanized by inhalation of diethyl ether for sperm protein analysis.

Sample collection and preparation
Sperm protein extraction was performed as described previously 8 . Briefly, sperm samples from each rat were collected separately from the caudal epididymis and minced in Biggers-Whitten-Whittingham medium (91.06 mM Sodium chloride, 4.78 mM Potassium chloride, 2.0 mM Calcium chloride, 1.17 mM potassium phosphate, 2.44 mM Magnesium sulphate, Sodium hydrogen carbonate, 0.25 mM Sodium pyruvate, 21.55 mM Sodium lactate, 5.55 mM glucose, and bovine serum albumin) 9 . Briefly, the samples were incubated in a 5% CO 2 incubator for 30 minutes at 37°C, to allow the sperm to swim up. Sperm samples were then centrifuged at 4000 rpm for 15 minutes and the supernatant was removed. The pellet was mixed with lysis buffer (7 M Urea, 2 M Thiourea, 4% 3-[(3-Cholamidopropyl)dimethyl-ammonio]-1-propanesulfonate (CHAPS), 0.8% Immobilized pH Gradient (IPG) buffer, 1 mM phenylmethane sulfonyl fluoride (PMSF)). The samples were then centrifuged at 15,000 rpm for 20 minutes at 4°C. The supernatant was collected, added to 60 mM dithiothreitol (DTT) and kept at 80°C until used. The concentration of sperm proteins was determined using the Bradford assay 10 .

SDS-PAGE
A total of 100 µg of the sperm protein samples from each group were used for one-dimensional SDS-PAGE. Protein samples were loaded onto 12.5% SDS-PAGE gels and run for eight minutes or until all the proteins stacked up in the resolving gel. The gel was then cut for trypsin digestion using the in-gel trypsin digestion method 11 . For this purpose, excised protein bands were treated with DTT for one hour at 57 °C to reduce the disulphide bridges, followed by Iodoacetamide for another hour at room temperature in the dark for alkylation purposes, as previously described 11 . Gels were rehydrated with 50 mM ammonium bicarbonate and dehydrated with acetonitrile for three times, respectively. Enzymatic digestion was performed by incubating samples with trypsin for 30 minutes at 4 °C in a 1: 50 (w/w) ratio. The digestion products were then incubated with 50 mM ammonium bicarbonate for overnight at 37 °C. Digested peptides were sent for liquid chromatography with tandem mass spectrometry (LC-MS/MS) for protein identification and quantification.
LC-MS/MS analysis LC-MS/MS analysis was performed as described in Kamaruzaman et al. 7 . Briefly, the peptide samples were analysed using a liquid chromatography (LC) system (UltiMate 3000 RSLCnano, Dionex) coupled to a Linear Trap Quadropole (LTQ) Orbitrap Fusion mass spectrometer (Thermo Fisher, Bremen, Germany). A total of 1.0 µL of digested samples from each group were injected into a reverse phase column (15 cm × 75 µm internal diameter, particle size of 2 µm, 100 Å, C18 PepMap column) and eluted at a flow rate of 300 nL/min. The elution mobile phase composition was 0.1% formic acid in water (solvent A) and 0.1% formic acid in acetonitrile (solvent B). The eluted peptides were separated in 5-40% solvent B for 91 minutes, 95% solvent B for 2 minutes, 95% solvent B for 6 minutes and back to 5% solvent B for 2 minutes. Data were acquired in data dependent mode. Full scan spectra in the range of 310-1800 m/z were acquired. The automatic gain control (AGC) was targeted at 4.0 × 10 5 with a maximum injection time of 50 ms and 3 seconds in top speed mode, where precursors were selected with a maximum cycle time of 3 seconds. Precursors with an assigned monoisotopic m/z and a charge between 2 and 7 were further analysed. All precursors were filtered using a 20 second dynamic exclusion window with an intensity threshold of 5000. The MS/MS spectra analyses were performed using rapid scan rate and collision-induced dissociation (CID), with normalized collision energy (NCE) set to 30% and high energy collision-induced dissociation (HCD) at 28%, a 1.6 m/z isolation window, AGC targeted at 1.0 × 10 2 and a maximum injection time of 250 ms.

Protein identification and quantification
Data processing was conducted according to previous studies 12,13 . Raw files for each biological replicate were analysed together using the MaxQuant software (version 1.5.3.30) 14 . The derived peak list was searched using the Andromeda search algorithm embedded in the MaxQuant workflow. The protein sequence database used for protein identification analysis was the Rattus norvegicus database, obtained from Uniprot database (proteome ID: UP000002494, accessed on February 2016). The MaxQuant parameters were set as enzyme trypsin and allowed up to two miss cleavages. The peptide length was set minimally up to seven amino acids. The spectra search included carbamidomethylation of cysteine as a fixed modification and oxidation of methionine was set as a variable modification. Since no labelling was performed, multiplicity was set to one. Peptide spectrum match (PSM) and protein identification were filtered using a target-decoy approach with a false discovery rate (FDR) of 1 %. The second peptide feature was enabled. The label-free quantification (LFQ) of protein was done using the MaxLFQ algorithm integrated in MaxQuant. Other MaxQuant settings were set as default. All resulting information was reported in the "proteinGroups" output file (proteinGroups.txt in the combined file), containing the full list of identified and quantified proteins. This file has been deposited to the ProteomeXchange Consortium 15 via the PRIDE partner repository with the dataset identifier PXD011373.
The data were further analysed for protein quantification using Perseus software (version 1.5.4.1) 16 . The hits to the reverse database, contaminants and proteins identified with modified peptides were eliminated. Then, the LFQ intensity ratios were transformed by log 2 . Missing values were imputed by drawing random numbers from a normal distribution to stimulate signals from low abundance proteins, using the default parameters described previously 12 . The value of log2 (LFQ ratios) for each sample were averaged and statistically analysed as per Kamaruzaman et al. 7 . In short, data were displayed as mean ± standard error of mean. The statistical analysis for protein LFQ intensities were compared using Perseus software (version 1.5.4.1) (one-way ANOVA, p < 0.05).

Dataset
The dataset is composed of raw and processed LC-MS/MS data for the proteome profiling of rat sperm. Samples were collected from the caudal epididymis of four different groups of rats; normal-untreated, diabetic-untreated (negative control), diabeticmetformin treated (positive control) and lastly, diabetic-GPAE treated. LC-MS/MS, followed by MaxQuant and Perseus analyses, were carried out to identify and quantify the total sperm proteins ( Table 1).
Protein identification revealed 473 proteins, and using a stringent search in MaxQuant software (including filtering contaminants and hits to decoy database), a total of 88 proteins were found in all groups. Identified proteins were later quantified using MaxQuant and Perseus software. This information revealed the effect of G. procumbens on the male diabetic rat reproductive system, specifically on the sperm proteins that are involved in fertilization.

Summary
The elucidation of sperm proteins using high-throughput proteomics techniques is still rather limited, especially in complementary medicine studies. Most of the previous studies 17, 18 , especially those involving herbal extract treatments on murine model organisms, mainly utilized a 2D gel-based approach, which is far less sensitive than shotgun proteomics 19 . Most of these studies identified only around 200-300 proteins, which may not profile the entirety of the sperm proteome. This shortcoming may be due to the low throughput of the gel technique, as well as the inability to visualize low-abundance proteins 16 .
On the other hand, shotgun proteomics mainly relies on the acquisition of peptide identities from digested total protein samples using an LC-MS/MS approach. In this study, 2411 peptides were found, which corresponds to 473 proteins. The high number of identified proteins suggests the applicability and sensitivity of shotgun proteomics, as compared to the more conventional 2D gel approach. Hence, shotgun proteomics, particularly using LC-MS/MS Orbitrap, can be considered as an effective strategy for producing a comprehensive proteome profiling of rat sperm, especially regarding GPAE treatment. Additionally, this dataset can be used as a primary guide for the characterisation of protein biomarkers for sperm quality and fertility in male rats.

Data availability
Underlying data Dataset on ProteomeXchange, Accession number PXD01133: https://identifiers.org/px/PXD011373 Rattus norvegicus reference proteome was obtained from Uniprot, Proteome ID UP000002494: https://www.uniprot.org/proteomes/ UP000002494. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Subject area
Reproductive Biology

More specific area Proteomics of rat sperm
Type of data Raw and processed data

How data was acquired
Experiments performed using LC system (UltiMate 3000 RSLCnano, Dionex) coupled to an LTQ Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific, Bremen, Germany)

Data format
Raw, processed

Experimental factors
Male Sprague-Dawley rats aged 8 weeks (120-200 g) were randomly divided into four groups. One group served as normal untreated group while three other groups were induced with STZ (50 mg/kg). These three groups were divided into; (i) diabetic-untreated group (negative control), (ii) diabetic metformin-treated group (positive control) and (iii) diabetic-GPAE-treated group (450 mg/kg).

Experimental features
Rat sperm proteome description and relative quantification. Extracted proteins were run on SDS-PAGE before being reduced, alkylated and tryptic digested. The peptide samples were then subjected to LC-MS/MS for protein profiling and analysed using MaxQuant and Perseus software.

Data accessibility
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium 15 via the PRIDE partner repository with the dataset identifier PXD011373.