Identification of pathogenic-specific open reading frames in staphylococci species

Fatima Naser Farhan; Andrzej Zielezinski; Wojciech M Karłowski

doi:10.12688/f1000research.142429.1

Home Browse Identification of pathogenic-specific open reading frames in staphylococci...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Identification of pathogenic-specific open reading frames in staphylococci species

[version 1; peer review: 2 not approved]

Fatima Naser Farhan ^1,2, Andrzej Zielezinski², Wojciech M Karłowski²

PUBLISHED 08 Jan 2024

Author details Author details

¹ Department of Health Data Analytics, Electronic Health Solutions, Amman, 11953, Jordan
² Department of Computational Biology, Adam Mickiewicz University, Poznan, 61-614, Poland

Fatima Naser Farhan
Roles: Conceptualization, Formal Analysis, Methodology, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Andrzej Zielezinski
Roles: Investigation, Methodology, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Wojciech M Karłowski
Roles: Conceptualization, Investigation, Methodology, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Genomics and Genetics gateway.

Abstract

Background

Bacteria within the Staphylococcus genus are notorious for causing a wide range of infections, and they possess genes that play a pivotal role in determining their pathogenicity. In this study, we characterized open reading frames (ORFs), which represent potential functional gene sequences, from selected staphylococcal genomes.

Methods

Our study involved the extraction, categorization, and annotation of ORFs using diverse analytical methods. This approach unveiled distinct ORFs in both pathogenic and non-pathogenic species, with some commonalities. To assess the conservation of these ORFs and their relevance to pathogenicity, we employed tblastn and Clustal Omega-Multiple Sequence Alignment (MSA) methods.

Results

Remarkably, we identified 23 ORFs that displayed high conservation among pathogenic staphylococci, with five of them extending beyond the Staphylococcus genus. These particular ORFs may encode products associated with RNA catabolism and could potentially function as regulatory small open reading frames (smORFs). Of particular interest, we found a single smORF situated within a conserved locus of the 50S ribosomal protein L1, present in 200 genomes, including 102 pathogenic strains.

Conclusions

Our findings highlight the existence of ORFs with highly conserved elements, proposing the existence of 23 novel smORFs that may play a role in the pathogenicity of Staphylococcus species.

Keywords

pathogenic Staphylococcus, non-pathogenic Staphylococcus, open reading frames, comparative analysis, bacteria, pathogenicity

Corresponding author: Fatima Naser Farhan

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 Farhan FN et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Farhan FN, Zielezinski A and Karłowski WM. Identification of pathogenic-specific open reading frames in staphylococci species [version 1; peer review: 2 not approved]. F1000Research 2024, 13:27 (https://doi.org/10.12688/f1000research.142429.1) First published: 08 Jan 2024, 13:27 (https://doi.org/10.12688/f1000research.142429.1) Latest published: 08 Jan 2024, 13:27 (https://doi.org/10.12688/f1000research.142429.1)

Introduction

The Staphylococcus genus consists of gram-positive cocci. The genus holds more than 40 species, grouped into pathogenic or non-pathogenic. Members of the pathogenic group are responsible for various infections such as nosocomial infections. However, non-pathogenic members are engaged in the food industry for the fermentation of cheese or meat. Scientists believe species habituate their pathogenic capabilities by possessing specific virulence factors acquired by horizontal gene transfer or mutations (Rosenstein & Götz, 2012).

Virulence factors encompass adhesins, exoenzymes, toxins, and a heterogeneous assortment. While adhesins interpose the attachment to host cells, exoenzymes destroy host tissue, and heterogeneous groups compromise iron uptake systems. Lastly, toxins directly exert damaging effects on the host. However, detecting any or all these factors in the staphylococcal genome does not make it pathogenic. For example, the non-pathogenic S. carnosus TM300 has the virulence factor sortase A (strA) within its genome. The strA is essential for mediating attachment to the host tissue, indicating that the role of strA is not exclusive and depends on the contribution of the cognate substrate proteins to the infectious pathway (Götz, Bannerman & Schleifer, 2006). Physiological properties significantly influence the pathogenicity of staphylococci. These properties impact their interactions with other pathogens, their ability to persist within the infected host, their resistance to antibiotics and antimicrobial compounds, as well as their capacity to evade neutrophil-mediated killing (Rosenstein & Götz, 2012).

Pathogenic staphylococci species’ ability to quickly adapt to antibiotic treatment is considered an indispensable feature. Antibiotic resistance genes, acquired by mobile genetic elements (transposons or plasmids), serve as mediators for this resistance ability, enabling rapid spread through lateral gene transfer or spontaneous mutation. The increasing resistance of staphylococci left only a few antibiotics effective in treating infections and increased species’ virulence ability (Ventola, 2015). For example, S. aureus has developed different strategies to counteract the effect of antibiotics resulting in the emergence of a new strain known as Methicillin-resistant Staphylococcus aureus (MRSA). MRSA alone is responsible for 11,285 deaths per year in the US, killing more Americans yearly than HIV, Parkinson’s disease, emphysema, and homicide combined (Guo et al., 2020).

A few studies have compared pathogenic and non-pathogenic staphylococci species. Such studies were usually limited to either the genomic aspect or performed on a small number of species. For instance, Rosenstein et al., (2009) analysed the genome of S. carnosus and compared a few features to S. aureus species. Heo and colleagues studied the genome of a few strains of S. epidiermidis, S. haemolyticus, and S. saprophyticus (Heo, Lee & Jeong, 2020). Although these species are opportunistic bacteria and involved in various infections, the study concluded that the genomes did not encode any virulence factors in S. aureus. Mannala et al. (2018) compared the genome of two highly virulent and low-virulent Staphylococcus aureus strains. Another study by Rosenstein & Götz (2012), defined the genomic information of pathogenic staphylococci species focusing on those derived from S. aureus strains. All previous studies highlighted similarities and dissimilarities between the genome of pathogenic and non-pathogenic staphylococci. Therefore, scrutiny of these analogies and diversity will enable us to understand the roots of their virulence ability and the followed infectious pathway.

The open reading frames are regions that either contain no stop codons or begin with a start codon and end with a termination codon. Each strand of the DNA sequence has three possible reading frames. Exploring bacterial ORFs provides an opportunity to discover novel functional genes (Cerqueira & Vasconcelos, 2020). Recently, biologists have been more concerned about the small ORFs (smORFs) (<50 amino acids) that manifest a vital role in several cellular regulatory activities, and more studies have focused on developing new approaches to annotate them (Mir et al., 2012). Intrinsic and extrinsic are the two in-silico methods for detecting ORFs. While the intrinsic pathway investigates ORFs coding potential, such as obvious ribosome binding site (RBS), the extrinsic technique hunts for conserved sequences among different species (Cerqueira & Vasconcelos, 2020). The latter is a potent approach for detecting smORFs (Warren et al., 2010; Wood et al., 2012; Cerqueira & Vasconcelos, 2020), considering that confirmed short-protein coding genes are without any marked RBS (Hemm et al., 2008). However, as of now, the number of annotated smORFs is considerably low.

In this study, we comparatively analysed ORFs extracted from ten selected staphylococci species, including five pathogenic and five non-pathogenic strains. Our objectives were to characterize the features of ORFs in pathogenic genomes, identify conserved ORFs specific to pathogenic staphylococci, and propose a novel approach for smORFs’ prediction and annotation. This study holds significance in addressing the need for comparative investigations of ORFs in pathogenic and non-pathogenic staphylococci genomes and contributes to the growing attention towards smORFs.

Methods

Sequence data

GenBank (Benson et al., 2010) and RefSeq (O’Leary et al., 2016) databases contain genome sequences of 98,079 staphylococci strains. We selected species whose pathogenicity is defined and confirmed for the comparative analysis. S. aureus Mu3 (Refseq assembly accession ID: GCA_000010445.1), S. lugdunensis HKU09-01 (GCA_000025085.1), S. haemolyticus JCSC1435 (GCA_000009865.1), S. saprophyticus ATCC 15305 (GCA_000010125.1), and S. schleiferi strain 1360-13 (GCA_001188855.1) represented the pathogenic species, while the non-pathogenic were S. carnosus TM300 (GCA_000009405.1), S. cohnii SNUDS-2 (GCA_001990205.1), S. warneri SG1 (GCA_000332735.1), S. nepalensis DSM 15150 (GCA_002902745.1) and S. pasteuri JS7 (GCA_002442915.1). We downloaded the genome sequence of these selected genomes from the NCBI FTP site (https://ftp.ncbi.nlm.nih.gov).

ORFs extraction

We employed version 6.6.0.0 of the EMBOSS getorf algorithm to extract open reading frames (ORFs) from the genomes specified in the previous section (Rice, Longden & Bleasby, 2000). The process involved running a Linux script with specific parameters for the EMBOSS getorf algorithm: ‘getorf -sequence genome.fa -find 1 -outseq genome_orf.txt.’ The optional qualifier -find [1] was utilized to determine the translation of ORF regions between the start and stop codons.

The identified ORFs were arranged according to size and any ORF shorter than 10 amino acids (aa) were discarded. Next, we identified shared ORFs within and between the two species groups. Eventually, we categorized these common ORFs into five groups based on their presence in the tested genomes: ORFs present in all tested genomes, ORFs present in all pathogenic genomes, ORFs present in all non-pathogenic genomes, ORFs present in some pathogenic genomes, and ORFs present in some non-pathogenic genomes (Underlying data: Appendix A–E) (Farhan et al., 2023). Figure 1 visualizes the ORFs filtration process (Farhan, 2023).

Figure 1. Schematic diagram of the open reading frames (ORFs) filtration process.

ORFs functional annotation and prediction

The functional annotation of ORFs followed two approaches: (i) a direct approach that utilized the annotated proteins files available in GenBank and RefSeq databases and (ii) an indirect approach based on the traditional BLASTp tool version 2.11.0 (Altschul et al., 1997). Following the direct approach, the sequence and coordinate of each ORF matched with its resembled annotated protein of the tested genomes — ORFs which failed in the direct annotation tested for the indirect annotation that utilized the BLASTp tool. The BLASTp tool parameters were adjusted to search in the non-redundant protein sequence database for homologous sequences to ORFs only in the Staphylococcus organism (taxid 1279), targeting a maximum of 100 species. Both identity level and query coverage should be higher than 85%.

The Blast2GO tool version 5.2.5 (Conesa et al., 2005) enables an efficient automatic functional annotation of protein sequences according to the gene ontology vocabulary. Gene ontology (GO) describes the biological framework of genes in three aspects: biological process, molecular function, and cellular component. The relationship between GO terms, when presented in graph-based terminology, the parent GO terms, refers to the node closer to the roots (Level = 2) of the graph and a child (Level ≥ 5) to that closer to the leaf nodes. Moreover, the algorithm performed the gene enrichment analysis and two-tailed Fisher exact test to identify the enriched biological processes in pathogenic tested genomes.

The DeepGOPlus algorithm (Version 1.0.2) was operated to predict the ORFs function, where their function was previously unknown (Kulmanov, Khan & Hoehndorf, 2018). The algorithm uses deep learning to learn features from query protein sequences besides cross-species protein-protein interaction networks. The resulting output was in the structure of GO terms, and the terms were presented in a graph chart using the QuickGO tool version 1.15 (Binns et al., 2009).

Conservation

A conserved sequence is an amino acid sequence in a protein (or a nucleotide base in DNA) that has remained unchanged throughout evolution to maintain a protein’s structure and function (Alberts et al., 2002). Testing the conservation level of unknown ORFs (unORFs) involved three stages: conservation within the Staphylococcus genus, conservation within pathogenic staphylococci species, and finally, extending the analysis beyond the Staphylococcus genus. We employed the tblastn algorithm version +2.11.0 (Altschul et al., 1997) to conduct the conservation test, gathering data based on identity level and query coverage (≥85). The student t-test provided by Python’s SciPy library was employed to assess the results’ significance.

Clustal Omega – a multiple sequence alignment (MSA) algorithm version 1.2.4 was used to align the sequences to assess the locus conservation assay (Sievers et al., 2011).

Pathogenicity assessment

Both PathogenFinder 1.1 and NCBI Pathogen detection datasets (Cosentino et al., 2013; NCBI, 1988) facilitated entitling the species as either known to be pathogenic (1) or unknown pathogenicity (0). An ORF’s pathogenicity refers to the number of pathogenic genomes that own a homologous sequence. The pathogen frequency of an ORF was specified by dividing the pathogenicity of an ORF by the total number of genomes (Figure 2).

Figure 2. Outline of estimating an open reading frame’s (ORF’s) pathogen frequency.

Genetic context and neighbouring genes

The Prodigal (Prokaryotic gene recognition and translation initiation site identification) algorithm (Version 2.6.3) pinpointed the ribosomal binding site (RBS) motifs in the S. aureus Mu3 genome (Hyatt et al., 2010) to verify whether the RBS motif preceded the ORF or not.

To explore the neighbouring genes for each ORF and outline its precise locus, the interval between all selected ORFs (selORFs) and genes of the S. aureus Mu3 genome was measured per their coordinates. We downloaded the annotated protein file for S. aureus Mu3 from Genbank FTP website.

Results

ORFs’ categorization

The analysis started with extracting ORFs from 10 selected genomes (five pathogenic and five non-pathogenic) and any ORF smaller than 10 amino acids was excluded from the analysis. Subsequently, we categorized them into five groups based on their presence in the tested genomes (Table 1). The results revealed that six ORFs were common to all tested genomes across both groups. Among the found in some pathogenic tested genomes group, 1572 ORFs were present, and 15 unique ORFs were specific to all pathogenic genomes. Likewise, some non-pathogenic genomes exhibited 1567 identified ORFs; all non-pathogenic genomes contained 13 exclusively unique ORFs.

Table 1. Distribution of open reading frames (ORFs) across five categories.

Category	Number of ORFs	Size distribution
Category	Number of ORFs	Min	Avg	Max
ORFs found in all tested genomes	6 ORFs	10 aa	17 aa	29 aa
ORFs found in some pathogenic tested genomes	1572 ORFs	10 aa	34 aa	690 aa
ORFs found in all pathogenic tested genomes	15 ORFs	10 aa	17 aa	46 aa
ORFs found in some non-pathogenic tested genomes	1567 ORFs	10 aa	24 aa	585 aa
ORFs found in all non-pathogenic tested genomes	13 ORFs	10 aa	18 aa	33 aa

ORFs’ annotation

Table 2 summarizes the results obtained from ORFs annotation via direct and indirect approaches. Interestingly, one ORF was identical to a part of the 30S ribosomal protein S9 sequence of all non-pathogenic tested genomes.

Table 2. Summary of results obtained from annotating the open reading frames (ORFs).

Category	Direct annotation	Indirect annotation	Not annotated
ORFs found in all tested genomes	0	0	6
ORFs found in some pathogenic tested genomes	92	3	1477
ORFs found in all pathogenic tested genomes	0	0	15
ORFs found in some non-pathogenic tested genomes	99	3	1465
ORFs found in all non-pathogenic tested genomes	1	0	12

Our methodology provided information about several hypothetical proteins whose functions were unknown. For example, the hypothetical protein in S. lugdunesis HKU09-01 overlapped with cell surface protein IsdA, indicating a role in transferring heme from haemoglobin to apo-IsdC. In S. aureus Mu3, the glycerophosphoryl diester phosphodiesterase homolog protein was identical to the unnamed protein product in S. haemolyticus JCSC1435. Parallel to results obtained from annotating ORFs of pathogenic tested genomes, the hypothetical protein of S. cohnii SNUDS-2 was identical to the YlbF/YmcA family competence regulator protein of S. nepalensis DSM 15150. In a nutshell, we detected 15 hypothetical proteins identical to known functional proteins shared between either some non-pathogenic or pathogenic species.

Functional diversity in pathogenic and non-pathogenic genomes

Both pathogenic and non-pathogenic groups have similar functional proteins, such as the 50S and 30S ribosomal proteins, translation initiation factors, acyl carrier proteins, ATP-binding proteins, transposase, and transcriptional regulators. Although both groups have 50S and 30S ribosomal proteins and transposase, they differ in counts and types. Eleven ORFs of pathogenic genomes overlapped with 28 proteins of either 30S or 50S ribosomal proteins, whereas 31 ORFs of non-pathogenic genomes overlapped with 65 ribosomal proteins. Sixty-one transposases were detected in pathogenic genomes, while only eight transposases in non-pathogenic genomes. The only family type familiar to both groups was the IS256 transposase family. According to the annotation results, 35 functions were unique to pathogenic genomes. In contrast, 38 processes were exclusive to non-pathogenic staphylococci.

In the context of exploring biological process GO terms (GO:0008150), our investigation revealed that multiple ORFs within both pathogenic and non-pathogenic tested genomes were associated with cellular process (GO:0009987) and metabolic process (GO:0008152) terms; however, they had several differences. The biological regulation (GO:0065007) GO term was dominant in the non-pathogenic compared to the pathogenic group (Figure 3A). The localization (GO:0051179), DNA integration (GO:0015074), DNA recombination (GO:0006310), cation transport (GO:0006812), and metal ion transport (GO:0030001) GO terms mapped solely to some ORFs from pathogenic tested genomes. In contrast, eight GO terms were exceptional to ORFs of non-pathogenic genomes (Figure 3B).

Figure 3. Comparison between the gene ontology (GO) annotation of open reading frames (ORFs) from pathogenic and non-pathogenic Staphylococcus genome.

(A) Biological process parent's GO terms. (B) Biological process child's GO terms. (C) Molecular function parent's GO terms. (D) Molecular function child's GO terms. (E) Cellular component parent's GO terms. (F) Cellular component child's GO terms.

Concerning the molecular function (GO:0003674) GO term’s map, the transporter activity (GO:0005215) and sequence-specific DNA binding (GO:0043565) were exclusively associated with several ORFs of pathogenic tested genomes (Figure 3C). On the other hand, nucleotidyltransferase activity (GO:0016779) and RNA binding (GO:0003723) GO terms were unique to ORFs of non-pathogenic genomes (Figure 3D).

When comparing the cellular process (GO:0005575) GO terms (Figure 3E), ORFs of pathogenic tested genomes were exclusively associated with the child GO term, large ribosomal subunit (GO:0015934) (Figure 3F). In contrast, ORFs of non-pathogenic tested genomes displayed specificity, aligning solely with the child GO term, small ribosomal subunit (GO:0015935).

In the enrichment analysis, biosynthetic (GO:0009058), cellular biosynthetic (GO:0044249), and organic substance biosynthetic (GO:1901576) processes were under-represented in the pathogenic compared to the non-pathogenic group with equal significant p-value (3.12E-05) and False Discovery Rate (FDR) value (0.0046016).

Ultimately, several functional characteristics distinguished ORFs of pathogenic tested genomes compared to ORFs of non-pathogenic genomes, and vice versa. We were left with many ORFs from both pathogenic and non-pathogenic species whose functions were unknown. These ORFs did not overlap with any annotated proteins, hits from the BLAST search were below the specified threshold, and they were not annotated to any GO term by the Blast2GO tool. We referred to these as unknown ORFs (unORFs).

Unknown ORFs’ conservation

The analysis revealed significant similarity between 816 unORFs from pathogenic tested genomes and over 49,000 sequences in various staphylococci species. This similarity was notably higher than observed among the 810 unORFs from non-pathogenic tested genomes, with a p-value of 5.59e-16 tested by U-Mann Whitney test, indicating substantial conservation (Figure 4A). Moreover, the identified unORFs from pathogenic genomes exhibited sensitivity to pathogenicity (p-value: 2.99e-43) (Figure 4B). Among these, 23 unORFs demonstrated exceptionally high conservation within pathogenic staphylococci genomes, displaying a pathogen frequency ≥ 0.98. We designated these as selected ORFs (selORFs). These selORFs, with an average size of 21 amino acids, exhibited specificity toward pathogenic staphylococci species (Underlying data: Appendix F) (Farhan et al., 2023).

Figure 4. Conservation analysis of unknown ORFs (unORFs) in the Staphylococcus genus.

(A) Conservation of unORFs in pathogenic and non-pathogenic genomes within staphylococci species. (B) UnORFs of pathogenic tested genomes conservation within pathogenic and unknown pathogenicity staphylococci species. (C) Selected ORFs (selORFs) conservation outside the Staphylococcus genus.

Subsequently, we explored the conservation level of the 23 selORFs beyond the Staphylococcus genus. Among them, selORF with ID AP009324.1_34709 and four other selORFs emerged in 293 genomes outside the staphylococci species (Figure 4C). Notably, most of these genomes (208 out of 293) belonged to the Bacillus genus.

Selected ORFs’ function prediction

Gene ontology and gene product

DeepGoPlus predicted that our selORFs play a role in the mRNA catabolic process (GO:0006402) besides sharing functional similarities with the Pelota gene. However, the predicted GO term’s confidence level was between 0.3 and 0.4, considering algorithms find it challenging to find patterns in short sequences.

Further, the Prodigal algorithm indicated that neither of the selORFs was downstream of an RBS motif. Hence, we explored the neighbouring genes to test the hypothesis that our selORFs were likely non-coding RNA, translated on different frames, and probably engaged in regulatory functions.

Neighbouring genes and anti-sense sequence

Regulatory small proteins regulate their neighbouring genes or genes on the opposite strand. We measured the interval between selORFs and genes within the model genome (S. aureus Mu3). The mean distance between selORFs and genes on the forward strand was 19.203 (log2), a value comparable to the mean distance of selORFs on the reverse strand 19.0951 (log2). Based on these distances, we categorized the selORFs into two groups: (i) those with a zero distance and (ii) those with a non-zero distance. In the first category, selORFs exhibited overlaps with other genes on the same or opposite strands, occurring in various reading frames.

Nine selORFs demonstrated overlap with genes positioned on the same strand. Among these, five selORFs exhibited overlap with coding genes, including transposase, hypothetical protein, and serine protease genes. Conversely, four selORFs displayed overlap with non-coding rRNA genes, as indicated in Figure 5A. Interestingly, two distinct selORFs showed overlap with the same rRNA gene (SAHV_r0002), mirroring a similar occurrence with the transposase gene (SAHV_2363). Notably, the extent of overlap remained constrained, with the selORFs covering at most 13% of the gene size.

Figure 5. Exploring selected ORFs’ (selORFs) neighbouring genes.

(A) Distribution of SelORFs overlapped with genes within the model genome (S. aureus Mu3). (B) selORF to gene size ratio distribution.

A total of 10 selORFs displayed overlap with genes situated on the opposite strand, with seven of these being coding genes and the remaining three being non-coding genes (Figure 5B). Within this set of 10 selORFs, eight exhibited overlaps with segments of individual genes on the opposite strand. Among these, three selORFs overlapped with tRNA-Val and rRNA non-coding genes. The remaining five selORFs demonstrated overlap with specific genes, namely hsdM (BAF77312.1), 50S ribosomal protein L1 (BAF77419.1), hypothetical protein (BAF77875.1), graD (BAF78358.1), and type I restriction enzyme EcoR124II M protein homolog (BAF78676.1). Furthermore, two selORFs displayed overlap with the 5′ and 3′ ends of distinct genes located on the opposite strand.

When the distance between a selORF and a gene was not zero in the second group, the selORFs did not overlap with any genes within the tested genome. In total, seven selORFs fell into this group. Among these seven, only three displayed notable proximity to genes, with distances less than 5 log2 units. Specifically, selORFs identified by the IDs AP009324.1_3643 and AP009324.1_3911 were close to the rRNA genes on the same strand (SAHV_r0003 and SAHV_r0007, respectively). In contrast, the selORF AP009324.1_34650 was near the 30s ribosomal protein S12 gene (SAHV_0543) on the opposite strand.

Insights of selORF (ID: AP009324.1_34709)

Of particular interest is the selORF previously mentioned (ID: AP009324.1_34709), which exhibited remarkable conservation in 102 of 200 pathogenic species. Most notably, this selORF stood out as the sole instance present in 100 non-Staphylococcus genomes, and it overlapped with the 50S ribosomal protein L1 on the opposite strand across these 200 genomes.

Notably, the termination codon of selORF was positioned 305 nucleotides away from the start codon of the former ribosomal protein in 10 genomes, signifying a conserved genetic location for selORF. According to MSA, the amino acid sequence shared with our selORF within the 50S ribosomal protein L1 exhibited 100% identity across the adopted genomes. However, variations were observed in nucleotide sequences, attributed to differing frame translations. Aligning the investigated protein in each adopted genome yielded varying similarity scores compared to the S. aureus Mu3 50S ribosomal L1 protein (Figure 6A).

Figure 6. Similarity score of the 50S ribosomal protein L1 amino acid and nucleotide sequences in related and adopted genomes.

Furthermore, we examined the specific selORF within genomes closely related to the Staphylococcus genus. According to NCBI taxonomy (Schoch, 2011), genomes such as Salinicoccus alkaliphilus DSM 16010 (NZ_FRCF01000009.1), Salinicoccus albus DSM 19776 strain YIM-Y21 (NZ_ARQJ01000028.1), Salinicoccus carnicancri Crm 50.SCCRM.1_10 (NZ_ANAM01000010.1), Nosocomiicoccus ampullae strain DSM 19163 (NZ_JACHHF010000004.1), and Nosocomiicoccus massiliensis isolate MGYG-HGUT-01449 (NZ_CABKSY010000018.1) were identified as closely related to the Staphylococcus genus. Interestingly, none of the 50S ribosomal protein L1 sequences in these analogous genomes matched the corresponding protein sequence in S. aureus Mu3 (Figure 6B). However, the MSA algorithm displayed that the region encompassing selORF exhibited similarity in amino acid and nucleotide sequences across all related genomes.

This selORF displayed significant conservation, particularly concerning the well-preserved 50S ribosomal protein L1 across multiple species. Despite its relatively small size (18 amino acids), the possibility of obtaining a functional protein remained notable within the Staphylococcus genus (0.0002964), as well as in Bacillus (0.0023744), all bacteria (0.00083889), and all organisms (0.0056284), based on data from the UniProt database.

Discussion

Comparative analyses

Both pathogenic and non-pathogenic staphylococci species occupy conserved proteins responsible for translation, replication, and survival (Rosenstein et al., 2009). Even though our results showed that both groups share the same fundamental functional proteins, each group developed genes that facilitate specific functions according to their adopted lifestyle. Results captured from the comparative analysis manifested their significance in various means by identifying functions of 15 hypothetical proteins, providing hints of the functional characteristics of each group, and highlighting a new methodology for spotting smORFs.

At this point, what distinguishes one group from another is still obscure, as there is a lack of studies comparing pathogenic to non-pathogenic species. Staphylococci species that are generally recognized as safe are known to be associated with food fermentation. Previous studies observed increased antioxidant activities in fermentation (Barrière, Leroy-Sétrin & Talon, 2001; Abubakar et al., 2012). Thiol reductase thioredoxin, oxidoreductase, and cytochrome aa3 quinol oxidase (restricted to non-pathogenic tested genomes) are enzymes required for the antioxidant pathway in bacteria. This feature of non-pathogenic species suggests it has been acquired as an adaptation to the fermentation’s environmental conditions (Rosenstein et al., 2009).

ABC transporter and heme IsdEF (iron-regulated surface determinant) transporter proteins (that were exclusive for pathogenic tested genomes) are required for the mechanism of heme obtaining in S. aureus (Nygaard et al., 2006; Zhu et al., 2008). Iron is a crucial metal for the life-sustaining of pathogenic bacteria and is vital for launching the infection process (Mazmanian et al., 2003; Kuroda et al., 2005). The IsdEF transporter is a surface lipoprotein that binds to heme and works beside the ABC transporter to transport heme into the cytoplasm of bacteria (Zhu et al., 2008). Another comparative study supports our findings as Rosenstein et al. (2010) also found iron uptake systems specific for pathogenic species.

Transposons are mobile genetic elements of bacteria, which our analysis detected as exclusive for pathogenic Staphylococcus. They encode transposase enzymes, act on specific DNA sequences, and insert them into a new target DNA site. Moreover, transposons enhance the genomic diversification of staphylococci species, so the more transposons in the genome, the higher plasticity of the genome is (Baba et al., 2002; Loessner et al., 2002). Non-pathogenic staphylococci are considered relatively more stable due to the lack of transposons in their genomes; such findings emphasize the role of mobile elements in the pathogenicity of Staphylococcus (Rosenstein et al., 2009). The study outcome related to transposons strengthens several previous studies that suggested a role for transposase in spreading the antibiotic resistance gene among different species (Rowland & Dyke, 1989; Ito et al., 2003; Schwendener & Perreten, 2011; Zong, 2013; Harmer & Hall, 2015; Partridge et al., 2018; Guo et al., 2020). Indeed, such features led to the increasing pathogenicity of staphylococci species.

Biosynthetic, cellular biosynthetic, and organic substance biosynthetic process GO terms were significantly under-represented in the pathogenic genomes tested. These terms are associated with the formation of substances required for metabolism. Cellular biosynthetic is involved in creating materials carried out by individual cells. However, the organic substance biosynthetic process is for any molecular entity containing carbon (Binns et al., 2009). As no article researches the biosynthetic process of any of the tested groups nor elaborates on the importance of such a process, the reason behind this underrepresentation still needs to be clarified. Nevertheless, localization, biological regulation, metal ion transport, and DNA recombination are typically expressed in pathogenic Staphylococcus genomes, as reported by several studies (Jin et al., 2014; Liu et al., 2018, 2020) and explains the annotation of these terms to ORFs of pathogenic tested genomes.

The infectious pathway of Staphylococcus is far more complicated and cannot be elucidated in one study. Thus, future studies are recommended with a larger sample size to test the expression level of each group’s uniquely annotated GO terms and deeply investigate their roles.

UnORFs exploration

The importance of our findings emerges from the fact that smORFs have been ignored over the years, although several recent studies have shown their enormous potential (Hobbs, Astarita & Storz, 2010; Khitun, Ness & Slavoff, 2019; Cerqueira & Vasconcelos, 2020). The 23 selORFs features correspond to all smORFs properties. They are small (size <50 amino acids) and highly conservative. Some are nested within genes and predicted to be involved in regulatory function, mostly in the mRNA catabolic process. The mRNA catabolic process occurs in the ribosome during translation elongation and induces pathways of mRNA decay (Hayamizu et al., 2005). Translation stalling occurs when; (i) mRNA is damaged or truncated, (ii) in case of excessive mRNA secondary structure, or (iii) upon which there are insufficient amounts of amino acid or tRNA (Nielsen et al., 2011; Wencker et al., 2021), and stalling regulates the translation of downstream genes (Nakatogawa & Ito, 2002).

Bacteria have a wide range of regulatory mechanisms in their cellular stress response. Several shreds of evidence have proposed that smORFs have a role in cellular stress responses, such as antibiotics, host-infection, and nutrition hemostasis (Kültz, 2005; Hobbs, Astarita & Storz, 2010; Hobbs et al., 2012). The proposed mechanism by which smORFs affect cellular stress response is via both transcriptional and post-transcription regulation pathways. Sigma factor B (SigB) mediates the transcriptional response approach, while sRNA mediates the post-transcriptional regulatory mechanism (Novick, 2003).

SigB contributes to the overall stress response in both staphylococci and bacilli. It regulates several gene transcription expressions, including those encoding virulence factors and biofilm formation in S. aureus (Wu, de Lencastre & Tomasz, 1996). SigB regulates the transcription of alternative frames and intergenic regions (IGRs) in bacteria, leading to harmful control genes in a collateral manner (Wu, de Lencastre & Tomasz, 1996; Bischoff et al., 2004; Miller et al., 2011). In a recent study, researchers identified three sigB-regulated genes within IGRs of S. aureus. Two of these genes contained smORFs encoding putative small proteins, whereas the third transcript was not preceded by a likely ribosomal binding site, suggesting it was a non-coding RNA. However, the transcript overlapped with the mntC gene by approximately 180 nucleotides, indicating a possible cis-acting anti-sense regulatory mechanism (Nielsen et al., 2011). Moreover, an unannotated smORF named gndA was found within the gnd gene. Researchers believed that the gndA is expressed in an alternative reading frame during heat shock in E. coli under the control of the Sig-B factor (Khitun, Ness & Slavoff, 2019).

RNA III belongs to the trans-encoded base pairing small RNAs (sRNAs) (Geisinger et al., 2006). RNA III controls several genes’ expression profiles. It forms an imperfect duplex that targets specific mRNAs and represses their translation (Wadler & Vanderpool, 2007; Nielsen et al., 2011), so far most studied sRNAs in E. coli, B. subtilis, and S. aureus appear to be non-coding.

In line with these elaborations, the selORFs are hypothesised to be transcribed under the regulation of the Sig-B factor in certain conditions, thereby producing non-coding sRNAs, which negatively regulate the transcription of the adjacent gene (Pförtner et al., 2014; Rodriguez Ayala, Bartolini & Grau, 2020). However, this hypothesis first requires the exclusion of false-positive smORFs (Fuchs et al., 2021), then transcriptomics analysis of the verified selORFs in different environments, besides experimentally investigating Sig-B’s role in controlling their transcripts.

Accession numbers

Genome assembly database - RefSeq accessions: Staphylococcus aureus subsp. aureus Mu3 genome assembly ASM1044v1. https://identifiers.org/refseq.gcf:GCF_000010445.1

Genome assembly database - RefSeq accessions: Staphylococcus lugdunensis HKU09-01 genome assembly ASM2508v1 https://identifiers.org/refseq.gcf:GCF_000025085.1

Genome assembly database - RefSeq accessions: Staphylococcus haemolyticus JCSC1435 genome assembly ASM986v1. https://identifiers.org/refseq.gcf:GCF_000009865.1

Genome assembly database - RefSeq accessions: Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 = NCTC 7292 genome assembly ASM1012v1. https://identifiers.org/refseq.gcf:GCF_000010125.1

Genome assembly database - RefSeq accessions: Staphylococcus schleiferi genome assembly ASM118885v1 https://identifiers.org/refseq.gcf:GCF_001188855.1

Genome assembly database - RefSeq accessions: Staphylococcus carnosus subsp. carnosus TM300 genome assembly ASM940v1. https://identifiers.org/refseq.gcf:GCF_000009405.1

Genome assembly database - RefSeq accessions: Staphylococcus cohnii. Genome assembly ASM199020v1. https://identifiers.org/refseq.gcf:GCF_001990205.1

Genome assembly database - RefSeq accessions: Staphylococcus warneri SG1 genome assembly ASM33273v1. https://identifiers.org/refseq.gcf:GCF_000332735.1

Genome assembly database - RefSeq accessions: Staphylococcus nepalensis genome assembly ASM290274v1. https://identifiers.org/refseq.gcf:GCF_002902745.1

Genome assembly database - RefSeq accessions: Staphylococcus pasteuri genome assembly ASM244291v1. https://identifiers.org/refseq.gcf:GCF_002442915.1

NCBI Reference Sequence: Salinicoccus alkaliphilus DSM 16010, whole genome shotgun sequence. Accession number NZ_FRCF01000009.1; https://identifiers.org/refseq:NZ_FRCF01000009.1

NCBI Reference Sequence: Salinicoccus albus DSM 19776 strain YIM-Y21 G343DRAFT_scaffold00006.6_C, whole genome shotgun sequence. Accession number NZ_ARQJ01000028.1; https://identifiers.org/refseq:NZ_ARQJ01000028.1

NCBI Reference Sequence: Salinicoccus carnicancri Crm 50.SCCRM.1_10, whole genome shotgun sequence. Accession number NZ_ANAM01000010.1; https://identifiers.org/refseq:NZ_ANAM01000010.1

NCBI Reference Sequence: Nosocomiicoccus ampullae strain DSM 19163 Ga0415238_04, whole genome shotgun sequence. Accession number NZ_JACHHF010000004.1; https://www.ncbi.nlm.nih.gov/nuccore/NZ_JACHHF010000004.1

NCBI Reference Sequence: Nosocomiicoccus massiliensis isolate MGYG-HGUT-01449, whole genome shotgun sequence. Accession number NZ_CABKSY010000018.1; https://www.ncbi.nlm.nih.gov/nuccore/NZ_CABKSY010000018.1

Data availability

Underlying data

Figshare: Underlying data for ‘Identification of pathogenic-specific open reading frames in staphylococci species’, https://doi.org/10.6084/m9.figshare.24588306.v1 (Farhan et al., 2023).

This project contains the following underlying data:

• Appendix A: ORFs found in some pathogenic species dataset.
• Appendix B: ORFs found in some nonpathogenic species dataset.
• Appendix C: ORFs found in all tested genomes dataset.
• Appendix D: ORFs found in all nonpathogenic tested genomes dataset.
• Appendix E: ORFs found in all pathogenic tested genomes dataset.
• Appendix F: Selected ORFs dataset

Figshare: Analysis methodology - Open reading frames comparative analysis, https://doi.org/10.6084/m9.figshare.24588696.v1 (Farhan, 2023).

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0)

References

Abubakar I, Gautret P, Brunette GW, et al.: Global perspectives for prevention of infectious diseases associated with mass gatherings. Lancet Infect. Dis. 2012; 12: 66–74. PubMed Abstract | Publisher Full Text
Alberts B, Johnson A, Lewis J, et al.: Molecular Biology of the Cell. 4th ed.New York: Garland Science. How Genomes Evolve. 2002. https://www.ncbi.nlm.nih.gov/books/NBK26836/
Altschul SF, Madden TL, Schäffer AA, et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25: 3389–3402. PubMed Abstract | Publisher Full Text | Free Full Text
Baba T, Takeuchi F, Kuroda M, et al.: Genome and virulence determinants of high virulence community-acquired MRSA. Lancet (London, England) 2002; 359: 1819–1827. Publisher Full Text
Barrière C, Leroy-Sétrin S, Talon R: Characterization of catalase and superoxide dismutase in Staphylococcus carnosus 833 strain. J. Appl. Microbiol. 2001; 91: 514–519. PubMed Abstract | Publisher Full Text
Benson DA, Karsch-Mizrachi I, Lipman DJ, et al.: GenBank. Nucleic Acids Res. 2010; 38: D46–D51. PubMed Abstract | Publisher Full Text | Free Full Text
Binns D, Dimmer E, Huntley R, et al.: QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. 2009; 25: 3045–3046. PubMed Abstract | Publisher Full Text | Free Full Text
Bischoff M, Dunman P, Kormanec J, et al.: Microarray-based analysis of the Staphylococcus aureus sigmaB regulon. J. Bacteriol. 2004; 186: 4085–4099. PubMed Abstract | Publisher Full Text | Free Full Text
Conesa A, Götz S, García-Gómez JM, et al.: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.Bioinformatics.September 2005; 21(18): 3674–3676. Publisher Full Text PubMed Abstract |
Cosentino S, Voldby Larsen M, Møller Aarestrup F, et al.: PathogenFinder - Distinguishing Friend from Foe Using Bacterial Whole Genome Sequence Data. PLoS One. 2013; 8: e77302. PubMed Abstract | Publisher Full Text | Free Full Text
Farhan F: Analysis methodology - Open reading frames comparative analysis for ‘Identification of pathogenic-specific open reading frames in staphylococci species’. [Dataset]. figshare. 2023. Publisher Full Text
Farhan F, Karlowski WM, Zielezinski A: Underlying data for ‘Identification of pathogenic-specific open reading frames in staphylococci species’. [Dataset]. figshare. 2023. Publisher Full Text
Fuchs S, Kucklick M, Lehmann E, et al.: Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet. 2021; 17: e1009585. PubMed Abstract | Publisher Full Text | Free Full Text
Geisinger E, Adhikari RP, Jin R, et al.: Inhibition of rot translation by RNAIII, a key feature of agr function. Mol. Microbiol. 2006; 61: 1038–1048. PubMed Abstract | Publisher Full Text
Götz F, Bannerman T, Schleifer KH: The Genera Staphylococcus and Macrococcus. The Prokaryotes. 2006:5–75. Publisher Full Text | Free Full Text
Guo Y, Song G, Sun M, et al.: Prevalence and Therapies of Antibiotic-Resistance in Staphylococcus aureus. Front. Cell. Infect. Microbiol. 2020; 10: 107. Publisher Full Text
Harmer CJ, Hall RM: IS26-Mediated Precise Excision of the IS26-aphA1a Translocatable Unit. MBio 2015; 6: e01866-01815. PubMed Abstract | Publisher Full Text | Free Full Text
Hayamizu TF, Mangan M, Corradi JP, et al.: The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biol. 2005; 6: R29. PubMed Abstract | Publisher Full Text | Free Full Text
Hemm MR, Paul BJ, Schneider TD, et al.: Small membrane proteins found by comparative genomics and ribosome binding site models. Mol. Microbiol. 2008; 70: 1487–1501. PubMed Abstract | Publisher Full Text | Free Full Text
Heo S, Lee J-H, Jeong D-W: Food-derived coagulase-negative Staphylococcus as starter cultures for fermented foods. Food Sci. Biotechnol. 2020; 29: 1023–1035. PubMed Abstract | Publisher Full Text | Free Full Text
Hobbs EC, Astarita JL, Storz G: Small RNAs and Small Proteins Involved in Resistance to Cell Envelope Stress and Acid Shock in Escherichia coli: Analysis of a Bar-Coded Mutant Collection. J. Bacteriol. 2010; 192: 59–67. PubMed Abstract | Publisher Full Text | Free Full Text
Hobbs EC, Yin X, Paul BJ, et al.: Conserved small protein associates with the multidrug efflux pump AcrB and differentially affects antibiotic resistance. Proc. Natl. Acad. Sci. U. S. A. 2012; 109: 16696–16701. PubMed Abstract | Publisher Full Text | Free Full Text
Hyatt D, Chen G-L, LoCascio PF, et al.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11: 119. Publisher Full Text
Ito T, Okuma K, Ma XX, et al.: Insights on antibiotic resistance of Staphylococcus aureus from its whole genome: genomic island SCC. Drug Resist. Updat. 2003; 6: 41–52. PubMed Abstract | Publisher Full Text
Jin W, Ibeagha-Awemu EM, Liang G, et al.: Transcriptome microRNA profiling of bovine mammary epithelial cells challenged with Escherichia coli or Staphylococcus aureusbacteria reveals pathogen directed microRNA expression profiles. BMC Genomics. 2014; 15: 181. PubMed Abstract | Publisher Full Text | Free Full Text
Khitun A, Ness TJ, Slavoff SA: Small open reading frames and cellular stress responses. Mol. Omics. 2019; 15: 108–116. PubMed Abstract | Publisher Full Text | Free Full Text
Kulmanov M, Khan MA, Hoehndorf R: DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics. 2018; 34: 660–668. PubMed Abstract | Publisher Full Text | Free Full Text
Kültz D: MOLECULAR AND EVOLUTIONARY BASIS OF THE CELLULAR STRESS RESPONSE. Annu. Rev. Physiol. 2005; 67: 225–257. Publisher Full Text
Kuroda M, Yamashita A, Hirakawa H, et al.: Whole genome sequence of Staphylococcus saprophyticus reveals the pathogenesis of uncomplicated urinary tract infection. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 13272–13277. PubMed Abstract | Publisher Full Text | Free Full Text
Liu Q, Chen N, Chen H, et al.: RNA-Seq analysis of differentially expressed genes of Staphylococcus epidermidis isolated from postoperative endophthalmitis and the healthy conjunctiva. Sci. Rep. 2020; 10: 14234. PubMed Abstract | Publisher Full Text | Free Full Text
Liu J, Yang L, Hou Y, et al.: Transcriptomics Study on Staphylococcus aureus Biofilm Under Low Concentration of Ampicillin. Front. Microbiol. 2018; 9: 2413. Publisher Full Text
Loessner I, Dietrich K, Dittrich D, et al.: Transposase-Dependent Formation of Circular IS 256 Derivatives in Staphylococcus epidermidis and Staphylococcus aureus. J. Bacteriol. 2002; 184: 4709–4714. PubMed Abstract | Publisher Full Text | Free Full Text
Mannala GK, Koettnitz J, Mohamed W, et al.: Whole-genome comparison of high and low virulent Staphylococcus aureus isolates inducing implant-associated bone infections. Int. J. Med. Microbiol. 2018; 308: 505–513. PubMed Abstract | Publisher Full Text
Mazmanian SK, Skaar EP, Gaspar AH, et al.: Passage of heme-iron across the envelope of Staphylococcus aureus. Science (New York, N.Y.) 2003; 299: 906–909. PubMed Abstract | Publisher Full Text
Miller M, Dreisbach A, Otto A, et al.: Mapping of interactions between human macrophages and Staphylococcus aureus reveals an involvement of MAP kinase signaling in the host defense. J. Proteome Res. 2011; 10: 4018–4032. PubMed Abstract | Publisher Full Text
Mir K, Neuhaus K, Scherer S, et al.: Predicting Statistical Properties of Open Reading Frames in Bacterial Genomes. PLoS One. 2012; 7: e45103. PubMed Abstract | Publisher Full Text | Free Full Text
Nakatogawa H, Ito K: The Ribosomal Exit Tunnel Functions as a Discriminating Gate. Cell. 2002; 108: 629–636. PubMed Abstract | Publisher Full Text
National Center for Biotechnology Information (NCBI) [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [ 1988] – [cited 2017 Apr 06]. https://www.ncbi.nlm.nih.gov/
Nielsen JS, Christiansen MHG, Bonde M, et al.: Searching for small σB-regulated genes in Staphylococcus aureus. Arch. Microbiol. 2011; 193: 23–34. PubMed Abstract | Publisher Full Text
Novick RP: Autoinduction and signal transduction in the regulation of staphylococcal virulence: Regulation of staphylococcus virulence. Mol. Microbiol. 2003; 48: 1429–1449. PubMed Abstract | Publisher Full Text
Nygaard TK, Liu M, McClure MJ, et al.: Identification and characterization of the heme-binding proteins SeShp and SeHtsA of Streptococcus equi subspecies equi. BMC Microbiol. 2006; 6: 82. Publisher Full Text
O’Leary NA, Wright MW, Brister JR, et al.: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44: D733–D745. PubMed Abstract | Publisher Full Text | Free Full Text
Partridge SR, Kwong SM, Firth N, et al.: Mobile Genetic Elements Associated with Antimicrobial Resistance. Clin. Microbiol. Rev. 2018; 31: e00088–e00017. PubMed Abstract | Publisher Full Text | Free Full Text
Pförtner H, Burian MS, Michalik S, et al.: Activation of the alternative sigma factor SigB of Staphylococcus aureus following internalization by epithelial cells - an in vivo proteomics perspective. Int. J. Med. Microbiol. 2014; 304: 177–187. PubMed Abstract | Publisher Full Text
Cerqueira FR, Vasconcelos ATR: OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. Database (Oxford) 2020; 2020: baaa067. PubMed Abstract | Publisher Full Text | Free Full Text
Rodriguez Ayala F, Bartolini M, Grau R: The Stress-Responsive Alternative Sigma Factor SigB of Bacillus subtilis and Its Relatives: An Old Friend With New Functions. Front. Microbiol. 2020; 11: 1761. PubMed Abstract | Publisher Full Text | Free Full Text
Rosenstein R, Götz F: What Distinguishes Highly Pathogenic Staphylococci from Medium- and Non-pathogenic?Dobrindt U, Hacker JH, Svanborg C, editors. Between Pathogenicity and Commensalism. Current Topics in Microbiology and Immunology. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012; pp. 33–89. PubMed Abstract | Publisher Full Text
Rosenstein R, Nerz C, Biswas L, et al.: Genome Analysis of the Meat Starter Culture Bacterium Staphylococcus carnosus TM300. Appl. Environ. Microbiol. 2009; 75: 811–822. PubMed Abstract | Publisher Full Text | Free Full Text
Rosenstein R, Götz F: Genomic differences between the food-grade Staphylococcus carnosus and pathogenic staphylococcal species.Int. J. Med. Microbiol.2010 Feb; 300(2-3): 104–108. PubMed Abstract | Publisher Full Text
Rowland SJ, Dyke KG: Characterization of the staphylococcal beta-lactamase transposon Tn552. EMBO J. 1989; 8: 2761–2773. PubMed Abstract | Publisher Full Text | Free Full Text
Schoch C: NCBI Taxonomy. 2011 Apr 7 [Updated 2020 Feb 11]. Taxonomy Help. Bethesda (MD): National Center for Biotechnology Information (US); 2011. Reference Source
Schwendener S, Perreten V: New Transposon Tn 6133 in Methicillin-Resistant Staphylococcus aureus ST398 Contains vga (E), a Novel Streptogramin A, Pleuromutilin, and Lincosamide Resistance Gene. Antimicrob. Agents Chemother. 2011; 55: 4900–4904. PubMed Abstract | Publisher Full Text | Free Full Text
Sievers F, Wilm A, Dineen D, et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011; 7: 539. PubMed Abstract | Publisher Full Text | Free Full Text
Ventola CL: The antibiotic resistance crisis: part 1: causes and threats. P T. 2015; 40: 277–283. PubMed Abstract
Wadler CS, Vanderpool CK: A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. U. S. A. 2007; 104: 20454–20459. PubMed Abstract | Publisher Full Text | Free Full Text
Warren AS, Archuleta J, Feng W-C, et al.: Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics. 2010; 11: 131. PubMed Abstract | Publisher Full Text | Free Full Text
Wencker FDR, Marincola G, Schoenfelder SMK, et al.: Another layer of complexity in Staphylococcus aureus methionine biosynthesis control: unusual RNase III-driven T-box riboswitch cleavage determines met operon mRNA stability and decay. Nucleic Acids Res. 2021; 49: 2192–2212. PubMed Abstract | Publisher Full Text | Free Full Text
Wood DE, Lin H, Levy-Moonshine A, et al.: Thousands of missed genes found in bacterial genomes and their analysis with COMBREX. Biol. Direct. 2012; 7: 37. PubMed Abstract | Publisher Full Text | Free Full Text
Wu S, de Lencastre H , Tomasz A: Sigma-B, a putative operon encoding alternate sigma factor of Staphylococcus aureus RNA polymerase: molecular cloning and DNA sequencing. J. Bacteriol. 1996; 178: 6036–6042. PubMed Abstract | Publisher Full Text | Free Full Text
Zhu H, Xie G, Liu M, et al.: Pathway for Heme Uptake from Human Methemoglobin by the Iron-regulated Surface Determinants System of Staphylococcus aureus. J. Biol. Chem. 2008; 283: 18450–18460. PubMed Abstract | Publisher Full Text | Free Full Text
Zong Z: Characterization of a complex context containing mecA but lacking genes encoding cassette chromosome recombinases in Staphylococcus haemolyticus. BMC Microbiol. 2013; 13: 64. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 08 Jan 2024

Author details Author details

¹ Department of Health Data Analytics, Electronic Health Solutions, Amman, 11953, Jordan
² Department of Computational Biology, Adam Mickiewicz University, Poznan, 61-614, Poland

Fatima Naser Farhan
Roles: Conceptualization, Formal Analysis, Methodology, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Andrzej Zielezinski
Roles: Investigation, Methodology, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Wojciech M Karłowski
Roles: Conceptualization, Investigation, Methodology, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 08 Jan 2024, 13:27

https://doi.org/10.12688/f1000research.142429.1

Copyright

© 2024 Farhan FN et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Farhan FN, Zielezinski A and Karłowski WM. Identification of pathogenic-specific open reading frames in staphylococci species [version 1; peer review: 2 not approved]. F1000Research 2024, 13:27 (https://doi.org/10.12688/f1000research.142429.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 08 Jan 2024

Views

4

Reviewer Report 19 Sep 2024

Vishnu Raghuram, umea university, Umeå, Sweden; Emory University, Atlanta, Georgia, USA

Not Approved

https://doi.org/10.5256/f1000research.155983.r321266

In this study, the authors begin with 10 Staphylococcus genomes each from a different species, grouped them into pathogenic and non-pathogenic, and then identified ORFs of unknown function. The authors then attempt to provide insights into the distribution of these ... Continue reading

In this study, the authors begin with 10 Staphylococcus genomes each from a different species, grouped them into pathogenic and non-pathogenic, and then identified ORFs of unknown function. The authors then attempt to provide insights into the distribution of these ORFs between the pathogenic and nonpathogenic groups as well as their potential pathogenic functions. This is an important line of research as a vast number of bacterial ORFs remain unannotated. However, selecting only one genome to represent an entire species significantly limits the scope of the study. Moreover, the study lacks sufficient detail in the methods which in turn makes it difficult to interpret the results. I have highlighted some examples below:

Methods -
Sequence data : How many genomes were available for each species and why were those specific accessions selected? Was it based on assembly quality? Or were those the only genome available for a given species?
ORFs extraction: The criteria for ORF selection is not clear, in Figure 1: “ORF sequence → has another copy? → No → Discard” - the reason behind discarding ORFs is not mentioned. Is an ORF that is present in only one genome within a species group discarded? (and if so, why?). What is the AA identity cutoff for considering an ORF unique vs shared?

Conservation: “to conduct the conservation test, gathering data based on identity level and query coverage.....Clustal Omega…was used to align the sequences to assess the locus conservation assay” - I understand the authors used a t-test but it is not clear exactly what was compared to assess significance. It is also not explained what the ‘Locus conservation assay’ is.

Results -
Unknown ORFs’ conservation: “The analysis revealed significant similarity between 816 unORFs from pathogenic tested genomes and over 49,000 sequences in various staphylococci species” - are these referring to BLAST results? How are the authors handling redundant matches (matches to multiple highly similar/identical genomes) ?

Figure 4: The source of the data that are plotted is not clear. The ‘Number of Genomes’ axis in the boxplots maxes out at 100, does that mean a single ORF can only be present in a maximum of 100 genomes? Or is this the number of species, as the authors mentioned earlier they are targeting a maximum of 100 species? In which case the axes labels must be changed.

In general, the figure legends do not have adequate descriptions of the plotted information, making them hard to interpret.

While the discussion section re-summarizes the results and related studies, it lacks any synthesis of new ideas/insights from the results presented in this study. This section also makes certain assumptions that are not convincing due to the small starting dataset of 10 genomes, eg: “Transposons detected as exclusive for pathogenic Staphylococcus.”.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bacterial genomics, phylogenetics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

7

Reviewer Report 05 Aug 2024

Gisela Storz, Eunice Kennedy Shriver National Institute of Child Health and Human Development,, Bethesda, USA

Not Approved

https://doi.org/10.5256/f1000research.155983.r288005

This study presents a cataloging of open reading frames (ORFs) in Staphylococcus aureus species, focusing on five pathogenic and five non-pathogenic strains. Information about what genes contribute to pathogenesis is of interest, and improved annotation is needed in all organisms. ... Continue reading

This study presents a cataloging of open reading frames (ORFs) in Staphylococcus aureus species, focusing on five pathogenic and five non-pathogenic strains. Information about what genes contribute to pathogenesis is of interest, and improved annotation is needed in all organisms. However, I found both the bioinformatic analysis and the results presented limited.

The study presents too many lists and too much speculation in the absence of substantive insights.

The authors put too much emphasis on possible functions. Programs like Blast2GO and DeepGOPlus just provide predictions in the absence of other data. Additionally, gene overlap does not always predict function, nor do regulatory small proteins necessarily “regulate their neighbouring genes or genes on the opposite strand”. Similarly, the presence of a gene in the genome of a pathogenic organism does not guarantee a role in pathogenesis.

Insufficient detail is provided for some of the analysis (thresholds, estimation of false positives and negatives, etc.). The lower size cut-off was 10 amino acids. Was there a top size cut-off?

Some statements are unusually phrased. For example,
Page 3: “toxins, and a heterogeneous assortment” (of what?)
Page 3: “adhesins interpose the attachment”
Page 3: “limited to either the genomic aspect”
Page 3: “The open reading frames are regions that either contain no stop codons”
Page 6: “Among the found”
Page 11: “species occupy conserved proteins” (encode?)

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Small protein discovery and characterization.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 08 Jan 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 08 Jan 24	read	read

Gisela Storz, Eunice Kennedy Shriver National Institute of Child Health and Human Development,, Bethesda, USA
Vishnu Raghuram, umea university, Umeå, Sweden; Emory University, Atlanta, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

19 Sep 2024 | for Version 1

Vishnu Raghuram, umea university, Umeå, Sweden; Emory University, Atlanta, Georgia, USA

4 Views Cite this report Responses(0)

Not Approved

In this study, the authors begin with 10 Staphylococcus genomes each from a different species, grouped them into pathogenic and non-pathogenic, and then identified ORFs of unknown function. The authors then attempt to provide insights into the distribution of these ORFs between the pathogenic and nonpathogenic groups as well as their potential pathogenic functions. This is an important line of research as a vast number of bacterial ORFs remain unannotated. However, selecting only one genome to represent an entire species significantly limits the scope of the study. Moreover, the study lacks sufficient detail in the methods which in turn makes it difficult to interpret the results. I have highlighted some examples below:

Methods -
Sequence data : How many genomes were available for each species and why were those specific accessions selected? Was it based on assembly quality? Or were those the only genome available for a given species?
ORFs extraction: The criteria for ORF selection is not clear, in Figure 1: “ORF sequence → has another copy? → No → Discard” - the reason behind discarding ORFs is not mentioned. Is an ORF that is present in only one genome within a species group discarded? (and if so, why?). What is the AA identity cutoff for considering an ORF unique vs shared?

Conservation: “to conduct the conservation test, gathering data based on identity level and query coverage.....Clustal Omega…was used to align the sequences to assess the locus conservation assay” - I understand the authors used a t-test but it is not clear exactly what was compared to assess significance. It is also not explained what the ‘Locus conservation assay’ is.

Results -
Unknown ORFs’ conservation: “The analysis revealed significant similarity between 816 unORFs from pathogenic tested genomes and over 49,000 sequences in various staphylococci species” - are these referring to BLAST results? How are the authors handling redundant matches (matches to multiple highly similar/identical genomes) ?

Figure 4: The source of the data that are plotted is not clear. The ‘Number of Genomes’ axis in the boxplots maxes out at 100, does that mean a single ORF can only be present in a maximum of 100 genomes? Or is this the number of species, as the authors mentioned earlier they are targeting a maximum of 100 species? In which case the axes labels must be changed.

In general, the figure legends do not have adequate descriptions of the plotted information, making them hard to interpret.

While the discussion section re-summarizes the results and related studies, it lacks any synthesis of new ideas/insights from the results presented in this study. This section also makes certain assumptions that are not convincing due to the small starting dataset of 10 genomes, eg: “Transposons detected as exclusive for pathogenic Staphylococcus.”.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bacterial genomics, phylogenetics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

7 Views

05 Aug 2024 | for Version 1

Gisela Storz, Eunice Kennedy Shriver National Institute of Child Health and Human Development,, Bethesda, USA

7 Views Cite this report Responses(0)

Not Approved

This study presents a cataloging of open reading frames (ORFs) in Staphylococcus aureus species, focusing on five pathogenic and five non-pathogenic strains. Information about what genes contribute to pathogenesis is of interest, and improved annotation is needed in all organisms. However, I found both the bioinformatic analysis and the results presented limited.

The study presents too many lists and too much speculation in the absence of substantive insights.

The authors put too much emphasis on possible functions. Programs like Blast2GO and DeepGOPlus just provide predictions in the absence of other data. Additionally, gene overlap does not always predict function, nor do regulatory small proteins necessarily “regulate their neighbouring genes or genes on the opposite strand”. Similarly, the presence of a gene in the genome of a pathogenic organism does not guarantee a role in pathogenesis.

Insufficient detail is provided for some of the analysis (thresholds, estimation of false positives and negatives, etc.). The lower size cut-off was 10 amino acids. Was there a top size cut-off?

Some statements are unusually phrased. For example,
Page 3: “toxins, and a heterogeneous assortment” (of what?)
Page 3: “adhesins interpose the attachment”
Page 3: “limited to either the genomic aspect”
Page 3: “The open reading frames are regions that either contain no stop codons”
Page 6: “Among the found”
Page 11: “species occupy conserved proteins” (encode?)

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Small protein discovery and characterization.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] Abubakar I, Gautret P, Brunette GW, et al.: Global perspectives for prevention of infectious diseases associated with mass gatherings. Lancet Infect. Dis. 2012; 12: 66–74. PubMed Abstract | Publisher Full Text

[2] Alberts B, Johnson A, Lewis J, et al.: Molecular Biology of the Cell. 4th ed.New York: Garland Science. How Genomes Evolve. 2002. https://www.ncbi.nlm.nih.gov/books/NBK26836/

[3] Altschul SF, Madden TL, Schäffer AA, et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25: 3389–3402. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Baba T, Takeuchi F, Kuroda M, et al.: Genome and virulence determinants of high virulence community-acquired MRSA. Lancet (London, England) 2002; 359: 1819–1827. Publisher Full Text

[5] Barrière C, Leroy-Sétrin S, Talon R: Characterization of catalase and superoxide dismutase in Staphylococcus carnosus 833 strain. J. Appl. Microbiol. 2001; 91: 514–519. PubMed Abstract | Publisher Full Text

[6] Benson DA, Karsch-Mizrachi I, Lipman DJ, et al.: GenBank. Nucleic Acids Res. 2010; 38: D46–D51. PubMed Abstract | Publisher Full Text | Free Full Text

[7] Binns D, Dimmer E, Huntley R, et al.: QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. 2009; 25: 3045–3046. PubMed Abstract | Publisher Full Text | Free Full Text

[8] Bischoff M, Dunman P, Kormanec J, et al.: Microarray-based analysis of the Staphylococcus aureus sigmaB regulon. J. Bacteriol. 2004; 186: 4085–4099. PubMed Abstract | Publisher Full Text | Free Full Text

[9] Conesa A, Götz S, García-Gómez JM, et al.: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.Bioinformatics.September 2005; 21(18): 3674–3676. Publisher Full Text PubMed Abstract |

[10] Cosentino S, Voldby Larsen M, Møller Aarestrup F, et al.: PathogenFinder - Distinguishing Friend from Foe Using Bacterial Whole Genome Sequence Data. PLoS One. 2013; 8: e77302. PubMed Abstract | Publisher Full Text | Free Full Text

[11] Farhan F: Analysis methodology - Open reading frames comparative analysis for ‘Identification of pathogenic-specific open reading frames in staphylococci species’. [Dataset]. figshare. 2023. Publisher Full Text

[12] Farhan F, Karlowski WM, Zielezinski A: Underlying data for ‘Identification of pathogenic-specific open reading frames in staphylococci species’. [Dataset]. figshare. 2023. Publisher Full Text

[13] Fuchs S, Kucklick M, Lehmann E, et al.: Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach. PLoS Genet. 2021; 17: e1009585. PubMed Abstract | Publisher Full Text | Free Full Text

[14] Geisinger E, Adhikari RP, Jin R, et al.: Inhibition of rot translation by RNAIII, a key feature of agr function. Mol. Microbiol. 2006; 61: 1038–1048. PubMed Abstract | Publisher Full Text

[15] Götz F, Bannerman T, Schleifer KH: The Genera Staphylococcus and Macrococcus. The Prokaryotes. 2006:5–75. Publisher Full Text | Free Full Text

[16] Guo Y, Song G, Sun M, et al.: Prevalence and Therapies of Antibiotic-Resistance in Staphylococcus aureus. Front. Cell. Infect. Microbiol. 2020; 10: 107. Publisher Full Text

[17] Harmer CJ, Hall RM: IS26-Mediated Precise Excision of the IS26-aphA1a Translocatable Unit. MBio 2015; 6: e01866-01815. PubMed Abstract | Publisher Full Text | Free Full Text

[18] Hayamizu TF, Mangan M, Corradi JP, et al.: The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biol. 2005; 6: R29. PubMed Abstract | Publisher Full Text | Free Full Text

[19] Hemm MR, Paul BJ, Schneider TD, et al.: Small membrane proteins found by comparative genomics and ribosome binding site models. Mol. Microbiol. 2008; 70: 1487–1501. PubMed Abstract | Publisher Full Text | Free Full Text

[20] Heo S, Lee J-H, Jeong D-W: Food-derived coagulase-negative Staphylococcus as starter cultures for fermented foods. Food Sci. Biotechnol. 2020; 29: 1023–1035. PubMed Abstract | Publisher Full Text | Free Full Text

[21] Hobbs EC, Astarita JL, Storz G: Small RNAs and Small Proteins Involved in Resistance to Cell Envelope Stress and Acid Shock in Escherichia coli: Analysis of a Bar-Coded Mutant Collection. J. Bacteriol. 2010; 192: 59–67. PubMed Abstract | Publisher Full Text | Free Full Text

[22] Hobbs EC, Yin X, Paul BJ, et al.: Conserved small protein associates with the multidrug efflux pump AcrB and differentially affects antibiotic resistance. Proc. Natl. Acad. Sci. U. S. A. 2012; 109: 16696–16701. PubMed Abstract | Publisher Full Text | Free Full Text

[23] Hyatt D, Chen G-L, LoCascio PF, et al.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11: 119. Publisher Full Text

[24] Ito T, Okuma K, Ma XX, et al.: Insights on antibiotic resistance of Staphylococcus aureus from its whole genome: genomic island SCC. Drug Resist. Updat. 2003; 6: 41–52. PubMed Abstract | Publisher Full Text

[25] Jin W, Ibeagha-Awemu EM, Liang G, et al.: Transcriptome microRNA profiling of bovine mammary epithelial cells challenged with Escherichia coli or Staphylococcus aureusbacteria reveals pathogen directed microRNA expression profiles. BMC Genomics. 2014; 15: 181. PubMed Abstract | Publisher Full Text | Free Full Text

[26] Khitun A, Ness TJ, Slavoff SA: Small open reading frames and cellular stress responses. Mol. Omics. 2019; 15: 108–116. PubMed Abstract | Publisher Full Text | Free Full Text

[27] Kulmanov M, Khan MA, Hoehndorf R: DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics. 2018; 34: 660–668. PubMed Abstract | Publisher Full Text | Free Full Text

[28] Kültz D: MOLECULAR AND EVOLUTIONARY BASIS OF THE CELLULAR STRESS RESPONSE. Annu. Rev. Physiol. 2005; 67: 225–257. Publisher Full Text

[29] Kuroda M, Yamashita A, Hirakawa H, et al.: Whole genome sequence of Staphylococcus saprophyticus reveals the pathogenesis of uncomplicated urinary tract infection. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 13272–13277. PubMed Abstract | Publisher Full Text | Free Full Text

[30] Liu Q, Chen N, Chen H, et al.: RNA-Seq analysis of differentially expressed genes of Staphylococcus epidermidis isolated from postoperative endophthalmitis and the healthy conjunctiva. Sci. Rep. 2020; 10: 14234. PubMed Abstract | Publisher Full Text | Free Full Text

[31] Liu J, Yang L, Hou Y, et al.: Transcriptomics Study on Staphylococcus aureus Biofilm Under Low Concentration of Ampicillin. Front. Microbiol. 2018; 9: 2413. Publisher Full Text

[32] Loessner I, Dietrich K, Dittrich D, et al.: Transposase-Dependent Formation of Circular IS 256 Derivatives in Staphylococcus epidermidis and Staphylococcus aureus. J. Bacteriol. 2002; 184: 4709–4714. PubMed Abstract | Publisher Full Text | Free Full Text

[33] Mannala GK, Koettnitz J, Mohamed W, et al.: Whole-genome comparison of high and low virulent Staphylococcus aureus isolates inducing implant-associated bone infections. Int. J. Med. Microbiol. 2018; 308: 505–513. PubMed Abstract | Publisher Full Text

[34] Mazmanian SK, Skaar EP, Gaspar AH, et al.: Passage of heme-iron across the envelope of Staphylococcus aureus. Science (New York, N.Y.) 2003; 299: 906–909. PubMed Abstract | Publisher Full Text

[35] Miller M, Dreisbach A, Otto A, et al.: Mapping of interactions between human macrophages and Staphylococcus aureus reveals an involvement of MAP kinase signaling in the host defense. J. Proteome Res. 2011; 10: 4018–4032. PubMed Abstract | Publisher Full Text

[36] Mir K, Neuhaus K, Scherer S, et al.: Predicting Statistical Properties of Open Reading Frames in Bacterial Genomes. PLoS One. 2012; 7: e45103. PubMed Abstract | Publisher Full Text | Free Full Text

[37] Nakatogawa H, Ito K: The Ribosomal Exit Tunnel Functions as a Discriminating Gate. Cell. 2002; 108: 629–636. PubMed Abstract | Publisher Full Text

[38] National Center for Biotechnology Information (NCBI) [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [ 1988] – [cited 2017 Apr 06]. https://www.ncbi.nlm.nih.gov/

[39] Nielsen JS, Christiansen MHG, Bonde M, et al.: Searching for small σB-regulated genes in Staphylococcus aureus. Arch. Microbiol. 2011; 193: 23–34. PubMed Abstract | Publisher Full Text

[40] Novick RP: Autoinduction and signal transduction in the regulation of staphylococcal virulence: Regulation of staphylococcus virulence. Mol. Microbiol. 2003; 48: 1429–1449. PubMed Abstract | Publisher Full Text

[41] Nygaard TK, Liu M, McClure MJ, et al.: Identification and characterization of the heme-binding proteins SeShp and SeHtsA of Streptococcus equi subspecies equi. BMC Microbiol. 2006; 6: 82. Publisher Full Text

[42] O’Leary NA, Wright MW, Brister JR, et al.: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44: D733–D745. PubMed Abstract | Publisher Full Text | Free Full Text

[43] Partridge SR, Kwong SM, Firth N, et al.: Mobile Genetic Elements Associated with Antimicrobial Resistance. Clin. Microbiol. Rev. 2018; 31: e00088–e00017. PubMed Abstract | Publisher Full Text | Free Full Text

[44] Pförtner H, Burian MS, Michalik S, et al.: Activation of the alternative sigma factor SigB of Staphylococcus aureus following internalization by epithelial cells - an in vivo proteomics perspective. Int. J. Med. Microbiol. 2014; 304: 177–187. PubMed Abstract | Publisher Full Text

[45] Cerqueira FR, Vasconcelos ATR: OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. Database (Oxford) 2020; 2020: baaa067. PubMed Abstract | Publisher Full Text | Free Full Text

[46] Rodriguez Ayala F, Bartolini M, Grau R: The Stress-Responsive Alternative Sigma Factor SigB of Bacillus subtilis and Its Relatives: An Old Friend With New Functions. Front. Microbiol. 2020; 11: 1761. PubMed Abstract | Publisher Full Text | Free Full Text

[47] Rosenstein R, Götz F: What Distinguishes Highly Pathogenic Staphylococci from Medium- and Non-pathogenic?Dobrindt U, Hacker JH, Svanborg C, editors. Between Pathogenicity and Commensalism. Current Topics in Microbiology and Immunology. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012; pp. 33–89. PubMed Abstract | Publisher Full Text

[48] Rosenstein R, Nerz C, Biswas L, et al.: Genome Analysis of the Meat Starter Culture Bacterium Staphylococcus carnosus TM300. Appl. Environ. Microbiol. 2009; 75: 811–822. PubMed Abstract | Publisher Full Text | Free Full Text

[49] Rosenstein R, Götz F: Genomic differences between the food-grade Staphylococcus carnosus and pathogenic staphylococcal species.Int. J. Med. Microbiol.2010 Feb; 300(2-3): 104–108. PubMed Abstract | Publisher Full Text

[50] Rowland SJ, Dyke KG: Characterization of the staphylococcal beta-lactamase transposon Tn552. EMBO J. 1989; 8: 2761–2773. PubMed Abstract | Publisher Full Text | Free Full Text

[51] Schoch C: NCBI Taxonomy. 2011 Apr 7 [Updated 2020 Feb 11]. Taxonomy Help. Bethesda (MD): National Center for Biotechnology Information (US); 2011. Reference Source

[52] Schwendener S, Perreten V: New Transposon Tn 6133 in Methicillin-Resistant Staphylococcus aureus ST398 Contains vga (E), a Novel Streptogramin A, Pleuromutilin, and Lincosamide Resistance Gene. Antimicrob. Agents Chemother. 2011; 55: 4900–4904. PubMed Abstract | Publisher Full Text | Free Full Text

[53] Sievers F, Wilm A, Dineen D, et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011; 7: 539. PubMed Abstract | Publisher Full Text | Free Full Text

[54] Ventola CL: The antibiotic resistance crisis: part 1: causes and threats. P T. 2015; 40: 277–283. PubMed Abstract

[55] Wadler CS, Vanderpool CK: A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. U. S. A. 2007; 104: 20454–20459. PubMed Abstract | Publisher Full Text | Free Full Text

[56] Warren AS, Archuleta J, Feng W-C, et al.: Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics. 2010; 11: 131. PubMed Abstract | Publisher Full Text | Free Full Text

[57] Wencker FDR, Marincola G, Schoenfelder SMK, et al.: Another layer of complexity in Staphylococcus aureus methionine biosynthesis control: unusual RNase III-driven T-box riboswitch cleavage determines met operon mRNA stability and decay. Nucleic Acids Res. 2021; 49: 2192–2212. PubMed Abstract | Publisher Full Text | Free Full Text

[58] Wood DE, Lin H, Levy-Moonshine A, et al.: Thousands of missed genes found in bacterial genomes and their analysis with COMBREX. Biol. Direct. 2012; 7: 37. PubMed Abstract | Publisher Full Text | Free Full Text

[59] Wu S, de Lencastre H , Tomasz A: Sigma-B, a putative operon encoding alternate sigma factor of Staphylococcus aureus RNA polymerase: molecular cloning and DNA sequencing. J. Bacteriol. 1996; 178: 6036–6042. PubMed Abstract | Publisher Full Text | Free Full Text

[60] Zhu H, Xie G, Liu M, et al.: Pathway for Heme Uptake from Human Methemoglobin by the Iron-regulated Surface Determinants System of Staphylococcus aureus. J. Biol. Chem. 2008; 283: 18450–18460. PubMed Abstract | Publisher Full Text | Free Full Text

[61] Zong Z: Characterization of a complex context containing mecA but lacking genes encoding cassette chromosome recombinases in Staphylococcus haemolyticus. BMC Microbiol. 2013; 13: 64. PubMed Abstract | Publisher Full Text | Free Full Text

Identification of pathogenic-specific open reading frames in staphylococci species

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Methods

Sequence data

ORFs extraction

Figure 1. Schematic diagram of the open reading frames (ORFs) filtration process.

ORFs functional annotation and prediction

Conservation

Pathogenicity assessment

Figure 2. Outline of estimating an open reading frame’s (ORF’s) pathogen frequency.

Genetic context and neighbouring genes

Results

ORFs’ categorization

Table 1. Distribution of open reading frames (ORFs) across five categories.

ORFs’ annotation

Table 2. Summary of results obtained from annotating the open reading frames (ORFs).

Functional diversity in pathogenic and non-pathogenic genomes

Figure 3. Comparison between the gene ontology (GO) annotation of open reading frames (ORFs) from pathogenic and non-pathogenic Staphylococcus genome.

Unknown ORFs’ conservation

Figure 4. Conservation analysis of unknown ORFs (unORFs) in the Staphylococcus genus.

Selected ORFs’ function prediction

Figure 5. Exploring selected ORFs’ (selORFs) neighbouring genes.

Insights of selORF (ID: AP009324.1_34709)

Figure 6. Similarity score of the 50S ribosomal protein L1 amino acid and nucleotide sequences in related and adopted genomes.

Discussion

Comparative analyses

UnORFs exploration

Accession numbers

Data availability

Underlying data

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated