vhcub: Virus-host codon usage co-adaptation analysis

Viruses show noticeable evolution to adapt and reproduce within their hosts. Theoretically, patterns and factors that affect the codon usage of viruses should reflect evolutionary changes that allow them to optimize their codon usage to their hosts. Some software tools can analyze the codon usage of organisms; however, their performance has room for improvement, as these tools do not focus on examining the codon usage co-adaptation between viruses and their hosts. This paper describes the vhcub R package, which is a crucial tool used to analyze the co-adaptation of codon usage between a virus and its host, with several implementations of indices and plots. The tool is available from: https://cran.r-project.org/web/packages/vhcub/.


Introduction
During the translation process from mRNAs to proteins, information is transmitted in the form of triple nucleotides, named codons, which encode amino acids. Multiple codons that encode one amino acid are known as synonymous codons. Studies concerning different organisms report that synonymous codons are not used uniformly within and between genes of one genome, a phenomenon known as codon usage bias (CUB) 1,2 . Since viruses rely on the tRNA pool of their hosts in the translation process, previous studies suggest that translation selection or/and directional mutational pressure act on the codon usage of the viral genome to optimize or deoptimize it towards the codon usage of their hosts 3,4 .
Tools and packages are available to analyze codon usage, e.g. coRdon 5 , but there is no package available that focuses on the examination of codon usage co-adaptation between viruses and their hosts. vhcub is a package implemented in R, which aims to easily analyze the co-adaptation of codon usage between a virus and its host. vhcub measures several codon usage bias measurements, such as effective number of codons (ENc) 6 , codon adaptation index (CAI) 7 , relative codon deoptimization index (RCDI) 8 , similarity index (SiD) 9 , synonymous codon usage orderliness (SCUO) 10 , and relative synonymous codon usage (RSCU) 10 . It also provides a statistical dinucleotide over-and under-representation with three different models.

Methods
Implementation vhcub imports Biostrings 11 , seqinr 12 and stringr 13 to handle fasta files and manipulate DNA sequences. Also, it imports coRdon 5 , which is used to estimate different CUB measures.
vhcub first converts the fasta format to data.frame type, to efficiently maintain and calculate different indices implemented in the package. Table 1 describes all the functions available in vhcub, and the result returned from each. Also, it contains references to the equations used to estimate each measure. Furthermore, vhcub uses ggplot2 14 to  A ggplot object.

ENc. GC3plot
Make an ENc-GC3 scatterplot 17 . Where the y-axis represents the ENc values and the x-axis represents the GC3 content. The red fitting line shows the expected ENc values when codon usage bias affected solely by GC3.
A ggplot object.
visualize two important plots named ENc-GC3 plot ( Figure 2) and PR2-plot (Figure 3), which help to explain the factors influencing a virus's evolution concerning its CUB.
Operation vhcub was developed using R and is available on CRAN. It is compatible with Windows, and major Linux operating systems. The package can be installed as: install.packages( "vhcub" ) Figure 1 describes the vhcub workflow. It starts with reading the fasta files for a virus and its host. After, nucleotide content analysis, codon usage bias analysis on genes and codon level (marked by the red boxes in Figure 1) can be applied independently (the blue boxes in Figure 1). However, within the same analysis, some measures rely on others. For example, the reference set of genes used to estimate a virus codon adaptation index was defined based on the effective number of codons of its host. Finally, the orange boxes in Figure 1 represent the two plots (ENc-GC3 plot and PR2-plot).

Use cases
Using vhcub to study the CUB of a virus, its host and the co-adaptation between them is straightforward.  As mentioned before, each category of analysis could be applied independently. Hence, this example will show only the codon usage bias analysis at the codon level.

Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Partly No competing interests were disclosed.