ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

vhcub: Virus-host codon usage co-adaptation analysis

[version 1; peer review: 2 approved]
PUBLISHED 23 Dec 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Pathogens gateway.

This article is included in the RPackage gateway.

Abstract

Viruses show noticeable evolution to adapt and reproduce within their hosts. Theoretically, patterns and factors that affect the codon usage of viruses should reflect evolutionary changes that allow them to optimize their codon usage to their hosts. Some software tools can analyze the codon usage of organisms; however, their performance has room for improvement, as these tools do not focus on examining the codon usage co-adaptation between viruses and their hosts. This paper describes the vhcub R package, which is a crucial tool used to analyze the co-adaptation of codon usage between a virus and its host, with several implementations of indices and plots. The tool is available from: https://cran.r-project.org/web/packages/vhcub/.

Keywords

Evolution, Natural selection, Adaptation, Viruses, Codon Usage Bias, R, RStudio

Introduction

During the translation process from mRNAs to proteins, information is transmitted in the form of triple nucleotides, named codons, which encode amino acids. Multiple codons that encode one amino acid are known as synonymous codons. Studies concerning different organisms report that synonymous codons are not used uniformly within and between genes of one genome, a phenomenon known as codon usage bias (CUB)1,2. Since viruses rely on the tRNA pool of their hosts in the translation process, previous studies suggest that translation selection or/and directional mutational pressure act on the codon usage of the viral genome to optimize or deoptimize it towards the codon usage of their hosts3,4.

Tools and packages are available to analyze codon usage, e.g. coRdon5, but there is no package available that focuses on the examination of codon usage co-adaptation between viruses and their hosts. vhcub is a package implemented in R, which aims to easily analyze the co-adaptation of codon usage between a virus and its host. vhcub measures several codon usage bias measurements, such as effective number of codons (ENc)6, codon adaptation index (CAI)7, relative codon deoptimization index (RCDI)8, similarity index (SiD)9, synonymous codon usage orderliness (SCUO)10, and relative synonymous codon usage (RSCU)10. It also provides a statistical dinucleotide over- and under-representation with three different models.

Methods

Implementation

vhcub imports Biostrings11, seqinr12 and stringr13 to handle fasta files and manipulate DNA sequences. Also, it imports coRdon5, which is used to estimate different CUB measures.

vhcub first converts the fasta format to data.frame type, to efficiently maintain and calculate different indices implemented in the package. Table 1 describes all the functions available in vhcub, and the result returned from each. Also, it contains references to the equations used to estimate each measure. Furthermore, vhcub uses ggplot214 to visualize two important plots named ENc-GC3 plot (Figure 2) and PR2-plot (Figure 3), which help to explain the factors influencing a virus’s evolution concerning its CUB.

Table 1. Functions available in vhcub, and the result returned from each one.

Function nameDescriptionValue
fasta.readRead fasta formate and convert it to data frameA list with two data.frames; the first one for
virus DNA sequences and the second one
for the host.
CAI.valuesMeasure the Codon Adaptation Index (CAI) using
Sharp and Li (1987)7 equation, of DNA sequence.
A data.frame containing the computed CAI
values for each DNA sequences within
df.fasta.
dinuc.baseA measure of statistical dinucleotide over- and
under-representation; by allows for random sequence
generation by shuffling (with/without replacement) of
all bases in the sequence13.
A data.frame containing the computed
statistic for each dinucleotide in all DNA
sequences within df.virus.
dinuc.
codon
A measure of statistical dinucleotide over- and
underrepresentation; by allows for random sequence
generation by shuffling (with/without replacement) of
codons13.
A data.frame containing the computed
statistic for each dinucleotide in all DNA
sequences within df.virus.
dinuc.
syncodon
A measure of statistical dinucleotide over- and
underrepresentation; by allows for random sequence
generation by shuffling (with/without replacement) of
synonymous codons13.
A data.frame containing the computed
statistic for each dinucleotide in all DNA
sequences within df.virus.
ENc.valuesMeasure the Effective Number of Codons (ENc)
of DNA sequence. Using its modified version
(Novembre, 2002)6.
A data.frame containing the computed ENc
values for each DNA sequences within
df.fasta.
GC.contentCalculates overall GC content as well as GC at first,
second, and third codon positions.
A data.frame with overall GC content as
well as GC at first, second, and third codon
positions of all DNA sequence from df.virus.
RCDI.valuesMeasure the Relative Codon Deoptimization Index
(RCDI)8 of DNA sequence.
A data.frame containing the computed ENc
values for each DNA sequences within
df.fasta.
RSCU.
values
Measure the Relative Synonymous Codon Usage
(RSCU)7 of DNA sequence.
A data.frame containing the computed
RSCU values for each codon for each DNA
sequences within df.fasta.
SCUO.
values
Measure the Synonymous Codon Usage Eorderliness
(SCUO) of DNA sequence using Wan et al., 200410
equation.
A data.frame containing the computed SCUO
values for each DNA sequences within
df.fasta.
SiD.valueMeasure the Similarity Index (SiD) between a virus
and its host codon usage15.
A numeric represent a SiD value.
PR2.plotMake a Parity rule 2 (PR2) plot16, where the AT-bias
[A3/(A3 +T3)] at the third codon position of the four-
codon amino acids of entire genes are the ordinate
and the GC-bias [G3/(G3 +C3)] is the abscissa. The
centre of the plot, where both coordinates are 0.5, is
where A = U and G = C (PR2), with no bias between
the influence of the mutation and selection rates.
A ggplot object.
ENc.
GC3plot
Make an ENc-GC3 scatterplot17. Where the y-axis
represents the ENc values and the x-axis represents
the GC3 content. The red fitting line shows the
expected ENc values when codon usage bias
affected solely by GC3.
A ggplot object.

Operation

vhcub was developed using R and is available on CRAN. It is compatible with Windows, and major Linux operating systems. The package can be installed as:

install.packages( "vhcub" )

Figure 1 describes the vhcub workflow. It starts with reading the fasta files for a virus and its host. After, nucleotide content analysis, codon usage bias analysis on genes and codon level (marked by the red boxes in Figure 1) can be applied independently (the blue boxes in Figure 1). However, within the same analysis, some measures rely on others. For example, the reference set of genes used to estimate a virus codon adaptation index was defined based on the effective number of codons of its host. Finally, the orange boxes in Figure 1 represent the two plots (ENc-GC3 plot and PR2-plot).

fc1f9bde-c884-4790-bbeb-df7ac260a4c6_figure1.gif

Figure 1. vhcub workflow, to analyze virus-host codon usage co-adaptation.

The white boxes represent the input fasta files. The red boxes represent three main analysis, each with different measures (the blue boxes), and the orange boxes represent ENc-GC3 plot and PR2-plot.

fc1f9bde-c884-4790-bbeb-df7ac260a4c6_figure2.gif

Figure 2. ENc-GC3 plot showing the values of the ENc versus the GC3 content for the virus (Escherichia virus T4) CDS, the solid red line represents the expected ENc values if the codon bias is affected by GC3s only.

fc1f9bde-c884-4790-bbeb-df7ac260a4c6_figure3.gif

Figure 3. PR2-plot showing CDS of the virus (Escherichia virus T4), plotted based on their GC bias [G3/(G3 + C3)] and AT bias [A3/(A3 + T3)] in the third codon position, the two solid red lines represent both coordinates (ordinate and abscissa) equal to 0.5, where A = T and G = C.

Use cases

Using vhcub to study the CUB of a virus, its host and the co-adaptation between them is straightforward. As an example, we have used the coding sequences for Escherichia virus T4 and its host Escherichia coli in the form of fasta format.

# First to call the library
library("vhcub")

# To read both files at the same time as a data.frame
# Using fasta.read() function
# virus.fasta = directory path to the virus fasta file
# host.fasta = directory path to the host fasta file.

fasta <− fasta.read (virus.fasta = "EscherichiavirusT4.fasta",
                     host.fasta = "Escherichiacoli.fasta")

fasta.T4 <− fasta[[1]]
fasta.Ecoli <− fasta[[2]]

As mentioned before, each category of analysis could be applied independently. Hence, this example will show only the codon usage bias analysis at the codon level.

# To estimate the similarity index (SiD) between E.coli T4 virus and E.coli

#First Calculate the Relative Synonymous Codon Usage (RSCU) for both of them
rscu.T4 <− RSCU.values(fasta.T4)
rscu.Ecoli <− RSCU.values(fasta.Ecoli)

# Then, the SiD could be calculated as
SiD <− SiD.value(rscu.Ecoli, rscu.T4)

SiD measures the effect of the codon usage bias of the E. coli on E. coli T4 virus. In general, SiD ranged from 0 to 1 with higher values indicating that the host has a dominant effect on the usage of codons. In this example, SiD is approximately equal to 0.491. Which means that E. coli does not dominate E. coli T4 CUB. Also, this code generates RSCU values for each codon in each gene from both organisms and can be used for further analysis.

Conclusions

vhcub depends only on DNA sequences as input and can compute different measures of CUB for viruses, such as ENc, CAI, SCUO, and RCDI (Table 1). It can also be used to study the association between viruses and their hosts’ RSCU and SiD. There are many possible directions for future work; further versions will execute more indices, plots, and statistical analysis, to facilitate the workflow for examining the adaptations of viruses’ CUB in the R environment.

Data availability

Escherichia virus T4 fasta file: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/836/945/GCF_000836945.1_ViralProj14044

Escherichia coli fasta file: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz

Software availability

Software available from: https://CRAN.R-project.org/package=vhcub

Source code available from: https://github.com/AliYoussef96/vhcub

Archived source code as at time of publication: http://doi.org/10.5281/zenodo.357239118

License: GPL-3

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Dec 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Anwar AM, Soudy M and Mohamed R. vhcub: Virus-host codon usage co-adaptation analysis [version 1; peer review: 2 approved]. F1000Research 2019, 8:2137 (https://doi.org/10.12688/f1000research.21763.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 23 Dec 2019
Views
47
Cite
Reviewer Report 27 Mar 2020
Oscar Leonardo Ramírez Suárez, Grupo SiAMo (Simulación, Análisis y Modelado) Universidad ECCI, Bogotá, Colombia 
Adriana Patricia Corredor-Figueroa, Grupo GINIC-HUS Universidad ECCI, Bogota, Colombia 
Approved
VIEWS 47
  • From the technical point of view, the vhcub R package looks quite reliable and well supported. However, there is only one example illustrating it. Moreover, this example shows the mean value for SiD in its range (i.e.,
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Suárez OLR and Patricia Corredor-Figueroa A. Reviewer Report For: vhcub: Virus-host codon usage co-adaptation analysis [version 1; peer review: 2 approved]. F1000Research 2019, 8:2137 (https://doi.org/10.5256/f1000research.23991.r61560)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
39
Cite
Reviewer Report 11 Feb 2020
Raj Kumar Singh, Facility for Research and Training on Bioassays and Biosensor, Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute [Deemed University], Izatnagar, Uttar Pradesh, India 
Approved
VIEWS 39
Viruses in the course of their evolution would optimize their codon usage to their hosts. They rely on the tRNA pool of their hosts in the translation process. Though tools for analyzing the codon usage of organisms are available, none ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Singh RK. Reviewer Report For: vhcub: Virus-host codon usage co-adaptation analysis [version 1; peer review: 2 approved]. F1000Research 2019, 8:2137 (https://doi.org/10.5256/f1000research.23991.r58975)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Feb 2020
    Ali Mostafa, Department of Genetics, Faculty of Agriculture, Cairo University, Egypt
    12 Feb 2020
    Author Response
    General comments (Corrections in the text)
    • Comment: The spelling of formate may be corrected to format in the second column X first row of Table 1.      
    ... Continue reading
  • Reviewer Response 13 Feb 2020
    Raj Kumar Singh, Facility for Research and Training on Bioassays and Biosensor, Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute [Deemed University], Izatnagar, India
    13 Feb 2020
    Reviewer Response
    The authors have accepted to do the necessary changes in the revised version and as well they have answered to my query.
    Competing Interests: No competing Interests
COMMENTS ON THIS REPORT
  • Author Response 12 Feb 2020
    Ali Mostafa, Department of Genetics, Faculty of Agriculture, Cairo University, Egypt
    12 Feb 2020
    Author Response
    General comments (Corrections in the text)
    • Comment: The spelling of formate may be corrected to format in the second column X first row of Table 1.      
    ... Continue reading
  • Reviewer Response 13 Feb 2020
    Raj Kumar Singh, Facility for Research and Training on Bioassays and Biosensor, Division of Veterinary Biotechnology, ICAR-Indian Veterinary Research Institute [Deemed University], Izatnagar, India
    13 Feb 2020
    Reviewer Response
    The authors have accepted to do the necessary changes in the revised version and as well they have answered to my query.
    Competing Interests: No competing Interests

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 23 Dec 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.