<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.173837.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Malware Detection Using RNA Encoding and Convolutional Neural Networks on the Malicious Network Dataset</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Fitian Rashid</surname>
                        <given-names>Omar</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8186-0795</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ali Abd</surname>
                        <given-names>Senan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Al-Shahwani</surname>
                        <given-names>Humam</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Geology, University of Baghdad, Baghdad, Baghdad Governorate, Iraq</aff>
                <aff id="a2">
                    <label>2</label>Department of Cybersecurity, College of Information Technology, University of Fallujah, Al-Fallujah, Al Anbar Governorate, Iraq</aff>
                <aff id="a3">
                    <label>3</label>Department of Computer Science, College of Science, University of Baghdad, Baghdad, Baghdad Governorate, Iraq</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:omar.f@sc.uobaghdad.edu.iq">omar.f@sc.uobaghdad.edu.iq</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>5</day>
                <month>5</month>
                <year>2026</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2026</year>
            </pub-date>
            <volume>15</volume>
            <elocation-id>241</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>28</day>
                    <month>4</month>
                    <year>2026</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Fitian Rashid O et al.</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/15-241/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>The detection of malware in network traffic remains a critical cybersecurity challenge. Traditional signature-based intrusion detection demonstrates a high level of familiarity with issues that have been recorded in the database; but show significantly lower effectiveness when it comes to polymorphic or zero-day attacks. Conversely, anomaly-based approaches are also endowed with the ability to detect new incursions, but often have a high false-positive
 rate.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>This study proposes a combined malware-detection framework which makes use of RNA encoding network-flow attributes alongside Convolutional Neural Network (CNN) classifiers. The framework has three functionalities: a Signature-CNN, which is trained on RNA-encoded representation of known malicious flows; an Anomaly-CNN, which is developed to distinguish between benign and malicious traffic without any signature prior knowledge; and a Hybrid-CNN, which combines both paradigms in a two-stage detection pipeline.</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>The research is carried out on the 10,000 samples that are split into training and testing subsets based on the 70/30 split strategy. The given model is trained in the context of a supervised learning model and assessed in terms of common performance metrics, such as accuracy, precision, recall, and F1-score. The experimental design is written in Python and deep learning libraries, so that the evaluation environment of all experiments is consistent and reproducible. Experiments conducted on the Malicious Network Dataset show that the Signature-CNN achieves 91% accuracy with strong precision on known threats, the Anomaly-CNN achieves 93% detection rate on unknown malware, and the Hybrid-CNN achieves the best overall performance with 95% detection rate and 94.5% F1 score.</p>
                </sec>
                <sec>
                    <title>Conclusions</title>
                    <p>The results demonstrate that RNA encoding combined with CNN classifiers offers a robust and scalable solution for malware detection in networked environments.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Malware Detection</kwd>
                <kwd>RNA Encoding</kwd>
                <kwd>Convolutional Neural Networks</kwd>
                <kwd>Network Security</kwd>
                <kwd>Malicious Network Dataset</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>The new copy of this manuscript includes some key changes based on the reviewer feedbacks. The abstract has been extended with more information about the experimental design and evaluation plan, which makes it more comprehensive and clear. Introduction has been well edited to enhance the quality of language, readability and academic tone. The limitation of the existing studies has been emphasized and a more critical analysis of the existing studies has been added to the Related Work section to clearly establish the limitation of the existing studies and to position the contribution of the proposed method in a better manner. In addition, the Methodology section is now expanded with specific training parameters of CNN model including the optimizer, learning rate, number of epochs, and batch size to facilitate the reproducibility. The Discussion has been greatly expanded with more information about how RNA-based encoding can be used to better represent features and the overall model performance. Lastly, the Conclusion has been revised to incorporate additional specific and practical future research directions, including possible extensions and implementations of the proposed approach. All these updates enhance the clarity, rigor and scientific contribution of the manuscript.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec id="sec5" sec-type="intro">
            <title>1. Introduction</title>
            <p>The exponential growth of networked systems proliferation has led to an equivalent increase in advanced malware attacks. Intrusion detection systems (IDS) continue to play a central role in the protection of digital infrastructure; however, modern practices are characterized by severe limitations.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> Signature-based IDS rely on a set of pre-defined rules or patterns, and these systems are effective only against previously identified threats. Anomaly-based IDS attempt to detect deviations from normal behavior, and has the ability to detect zero-day attacks, but often generate a high false positive. To address these limitations, a hybrid architecture has arisen, combining the benefits of each of the system.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>,
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
            </p>
            <p>There have been recent studies into how deep-learning architectures, especially convolutional neural networks (CNNs) and recurrent neural networks (RNNs), can be applied to the problems of IDS, with promising results. A novel intrusion detection method based on learning framework is proposed,
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> where the proposed method is done by using dual parallel CNN pipelines to independently address the network and radar features. A new intrusion detection model is suggested by,
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> where this model combines CNN and Random Forest. The CNN is utilized to extract the feature, and the Random Forest is used for classification. An IDS is proposed by combining an innovative hybrid Autoencoder with an enhanced LSTM-CNN architecture,
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> where the proposed method can enhance the detection capabilities more quickly and efficiently. Kaissar et al.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> investigates the optimization of hyperparameters in CNN to enhance the NIDS performance, where Grid Search, Genetic Algorithm, Particle Swarm Optimization, and Grey Wolf Optimization algorithms are used for this purpose. Alrayes et al.
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> suggested a novel IDS by combining channel attention and CNN, where the suggested method has exceptional accuracy when applied it to NSL-KDD dataset. A new IDS model is built based on CNN and knowledge distillation,
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> this model using two-dimensional Fourier transform for converting the grayscale images to the frequency domain, and this led to enhanced the similarity between neighboring pixels to address data effectively. Ban et al.
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> suggested an enhanced deep-learning model for IDS in IoT environment, where the suggested model is depending on CNN as the backbone network in the constructed model. A hybrid deep learning IDS is proposed by
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> based on CNN and bidirectional long short-term memory neural networks, where the proposed system is enhanced the model&#x2019;s ability to detect patterns in both minority and majority classes. Altunay and Albayrak
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> developed IDS in the IIoT networks, where the suggest system is done by using three different deep learning architectures, which are CNN, Long Short-Term Memory (LSTM), and the combination of these two methods.</p>
            <p>Parallel Encodings Biologically inspired encodings, like mappings to DNA and RNA sequences, have been proposed to convert heterogeneous data to symbolic strings.
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>,
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup> Nevertheless, there is limited evidence in the extant literature of the integration of RNA encodings with CNN classifiers in the entire range of detection paradigms: signature, anomaly, and hybrid. The current paper seals this gap by suggesting a CNN-based model which incorporates these complementary detection schemes. Despite the good outcomes of the current methods, a number of shortcomings still exist. Most CNN-based approaches use traditional data representations, which might not best represent intricate feature interactions, leading to worse performance in adverse conditions. Moreover, certain methods have higher computational costs and reduced resilience to a variety of data or when used on noisy data. These drawbacks indicate why encoding methods should be more effective and articulate. Here the proposed approach refers to encoding based on RNA to increase the feature representation to allow the model to learn more discriminative features to enhance the performance of the model in comparison with the current methods.</p>
        </sec>
        <sec id="sec6">
            <title>2. Methodology</title>
            <p>The proposed malware detection system is built by combining of RNA-inspired encoding of the network traffic characteristics and the convolutional neural network (CNN) classification. Unlike traditional intrusion detection system models that treat signature-based and anomaly-based detection methodologies as dissimilar entities, the current system integrates both of them in a single deep learning pipeline. In this pipeline, CNN models that are trained on sequences coded using the RNA-inspired methodology concurrently address signature-based, anomaly-based, and hybrid detection. The steps of the proposed system are shown in 
                <xref ref-type="fig" rid="f1">
Figure 1</xref>.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>
Figure 1. </label>
                <caption>
                    <title>Flow chart of the suggested malware detection system with RNA encoding and CNN classification pipeline.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/199234/eab33da7-608f-4884-b7fe-0961fb4aa37e_figure1.gif"/>
            </fig>
            <sec id="sec7">
                <title>2.1 Dataset description</title>
                <p>The Malicious Network Dataset is a new dataset that was collected using honeypots deployed with the Honeytrap agent. The dataset consists of 9 features that represent various aspects of network traffic, including both structural and payload data.
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> These features are shown in 
                    <xref ref-type="table" rid="T1">
Table 1</xref> as follows:</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>The malicious network dataset values and its descriptions.
                            <sup>
                                <xref ref-type="bibr" rid="ref15">15</xref>
                            </sup>
                        </title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Feature</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Column</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Possible values</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>1</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Protocol</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">TCP, IP</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>2</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>remote_ip</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Too many unique values (10,951). Example: ['35.203.211.180', '180.93.172.180', '81.17.19.66', '0.0.0.0', '94.23.145.155']</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>3</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>remote_port</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Numeric range: 0 &#x2013; 65535 (unique=28,920)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>4</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>local_ip</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Too many unique values (4,030). Example: ['165.227.180.71', '0.0.0.0', '192.168.202.139', '157.240.11.61', '192.168.11.143']</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>5</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>local_port</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Numeric range: 0 &#x2013; 65535 (unique=11,985)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>6</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>md5_hash</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Too many unique values (13,732). Example: ['1fb4aeaab94ca27d2e5dfaa47e11a6fb', 'd41d8cd98f00b204e9800998ecf8427e', '19b893b938ace1defe7d090e510f0618', &#x201c;, '59b490c4ab003464ca03428b3fc63222']</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>7</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>sha512_hash</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Too many unique values (13,732). Example: ['a6d4f36a2a8d5b5ea7e6afe91d4a80e7a9ae2129ebf4791a4d74b7ea003420c2&#x2026;', 'cf83e1357eefb8bdf1542850d66d8007&#x2026;', 'f060846cbf02e31706d5d0fe781d7007&#x2026;', &#x201c;, '922450f93e933de877934cee97339ae6&#x2026;']</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>8</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Length</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Numeric range: 0 &#x2013; 1448 (unique = 1,038)</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>9</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>data_hex</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Too many unique values (13,732). Example: ['16030100ca010000c60303918984ed51d5c0b8d4cfad730a16a4efb24b004062e21&#x2026;', &#x201c;, '50100', '0', '474554202f20485454502f312e310d0a486f7374&#x2026;']</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>10</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Class</bold>
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">'malicious' or 'benign'.</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec8">
                <title>2.2 RNA encoding of network features</title>
                <p>Network flows in the Malicious Network Dataset comprise heterogeneous attributes, such as protocol types, port numbers, cryptographic hashes, packet lengths, and payloads. Such attributes are of different scales and representation thus complicating direct modeling. In this spirit we introduce a biologically inspired RNA encoding scheme where each element is coded to a fixed set of codons. Where the mapping rules as follow:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>Remote ip and local ip attributes were eliminated, because a malware detection model must identify malicious signatures regardless of the source and destination IP addresses, and these ips do not provide meaningful behavioral indicators of malware.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>Protocol identifiers (e.g., TCP, IP) are assigned codons such as TCP &#x2192; G, IP &#x2192; U.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>Numerical fields (ports, lengths) are separated into digits, each mapped to a codon, where each digit is represented by two RNA characters, e.g., 0 &#x2192; CG, 1 &#x2192; AC, and so on.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>Hexadecimal payload values are mapped similarly, where each character is represented by two RNA characters, e.g., a &#x2192; AU, b &#x2192; UU, and so on.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>The hash values, containing both MD5 and SHA512 are divided into single characters and coded into codons, thus, making sure that each unique character is represented by a deterministic codon.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>For each flow, codon sequences from all fields are concatenated in the following order: [protocol] &#x2192; [remote port] &#x2192; [local port] &#x2192; [MD5] &#x2192; [SHA512] &#x2192; [length] &#x2192; [payload].</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>The built RNA encoding for all possible malicious network dataset records values is shown in 
                                <xref ref-type="table" rid="T2">
Table 2</xref>.</p>
                        </list-item>
                    </list>
                </p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>
Table 2. </label>
                    <caption>
                        <title>RNA encoding for malicious network dataset records values.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Value</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
RNA encoding</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">TCP</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">G</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">IP</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">U</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CG</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">AC</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">2</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">GG</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">UA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">4</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CC</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">5</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">GA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">6</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">UC</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">7</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">AA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">8</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">GU</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">9</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">UG</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">a</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">AU</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">b</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">UU</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">c</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">d</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">AG</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">e</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">CU</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">f</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">GC</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>This mapping transforms a wide range of categorical and string features into structured RNA codon sequences, thus making training of convolutional neural networks on homogeneous sequential inputs possible.</p>
            </sec>
            <sec id="sec9">
                <title>2.3 CNN architecture</title>
                <p>The coded messages are fed into a Convolutional Neural Network (CNN) that picks up discriminative features at a variety of levels of abstraction as follow:
                    <list list-type="bullet">
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>

                                <bold>Embedding:</bold> The codons are first mapped to dense vectors of dimension, d = 32. This embedding is learnt alongside the classifier, thus, encoding similarities between codons.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>

                                <bold>Convolutional:</bold> A number of one-dimensional convolutional layers are used, the size of which varies between 5 and 7. These filters identify a local pattern in the codon sequences e.g., repeated sequences which can be an indication of malicious activity. An example would be to have a convolution filter that is trained to identify the codon sequence of known back door ports.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>

                                <bold>Pooling:</bold> The feature maps are down sampled through max-pooling, and the most conspicuous features are retained, with a lower computational cost.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>

                                <bold>Global Average Pooling:</bold> In order to generalize over a wide range of lengths of variable sequences and reduce overfitting, global average pooling is done to aggregate the feature maps into fixed-size vectors.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>

                                <bold>Dense Layers:</bold> The learned features are combined in fully connected layers (64 units) that use ReLU activation. To inhibit overfitting, dropout regularization is used, with a temporary activation of neurons in the course of training, i.e. p = 0.5.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>

                                <bold>Output Layer:</bold> One sigmoid neuron generates a probability score which can mark a sample to be benign or malicious.</p>
                        </list-item>
                        <list-item>
                            <label>&#x25cb;</label>
                            <p>The CNN architecture is applied for three different methods, and these methods are shown in 
                                <xref ref-type="fig" rid="f2">
Figure 2</xref>.
</p>
                        </list-item>
                    </list>
                </p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>
Figure 2. </label>
                    <caption>
                        <title>The three CNN models (Signature, Anomaly, and Hybrid).</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/199234/eab33da7-608f-4884-b7fe-0961fb4aa37e_figure2.gif"/>
                </fig>
                <p>In Signature-CNN, the Signature- CNN replaces traditional rule-based signature matching with a convolutional neural network, which is trained on representations of known patterns of malicious activity encoded by RNA. Instead of searching manually through the collection of byte sequences or hash values, the network is trained to identify codon-level motifs which are indicative of malicious flows. While the Anomaly- CNN is trained to identify deviation with the normal network behavior using sequences encoded by RNA. It is not based on predefined attack patterns as compared to the signature model. Finally, the Hybrid-CNN combines the two methods in two-staged pipeline, allowing the Signature-CNN to combine the precision of Anomaly-CNN with the generalization capability. The comparison between the three CNN models is clarified in 
                    <xref ref-type="table" rid="T3">
Table 3</xref>.</p>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>
Table 3. </label>
                    <caption>
                        <title>Comparison between different CNN models.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Input</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Objective</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Strengths</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Weaknesses</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Signature-CNN
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">RNA-coded sequences of known malicious and benign flows</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Detect known attack patterns using learned codon-level motifs</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">High precision and low FAR</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Cannot detect zero-day threats</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Anomaly-CNN
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">RNA-coded mixture of benign and malicious flows</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Identify deviations from normal codon distributions</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Detects novel and polymorphic malware</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Higher false-positive rate</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Hybrid-CNN
</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Two-stage input: Signature-filtered and residual flows</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Combine the benefits of both precision and generalization</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Best overall accuracy and balance</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Slightly higher computational cost</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Where the sequence handling all the RNA sequences were truncated or zero padded to a constant length of 2,048 codons to make the input equal. Also, the architectural consistency for the CNN models of the three models have the same architecture (three Conv1D layers with 64-128-256 filters, kernel = 5, ReLU activation, and dropout = 0.5). It is only in training objectives that there is a difference. Finally, the dataset was stratified 70/30 and sampled to maintain the malicious/benign ratio (50/50). These explanations guarantee the maximum reproducibility of the experiment. The CNN model is optimized through Adam optimization algorithm of learning rate 0.001. The training is undertaken through 50 epochs having a batch size that is 32. In a bid to reduce overfitting, dropout regularization, with a dropout rate of 0.5, is used, and early stopping is utilized, on the basis of validation loss. The model is optimized with categorical cross-entropy loss function and the performance of the model is continuously checked on a validation dataset during the training process to ascertain stability and convergence.</p>
            </sec>
        </sec>
        <sec id="sec10" sec-type="results|discussion">
            <title>3. Results and discussion</title>
            <p>The performance of the proposed CNN-based malware detection models is evaluated based on several standard classification metrics were employed, where these metrics are defined and calculated as follow:
                <list list-type="bullet">
                    <list-item>
                        <label>&#x25cb;</label>
                        <p>Accuracy: Is the ratio of correctly labeled flows to the total number of flows, and it calculate based on the following equation:
                            <disp-formula id="e1">

                                <mml:math display="block">
                                    <mml:mtext>Accuracy</mml:mtext>
                                    <mml:mo>=</mml:mo>
                                    <mml:mfrac>
                                        <mml:mrow>
                                            <mml:mi mathvariant="italic">TP</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">TN</mml:mi>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mi mathvariant="italic">TP</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">TN</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">FP</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">FN</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                </mml:math>
</disp-formula>
                        </p>
                    </list-item>
                </list>

                <list list-type="bullet">
                    <list-item>
                        <label>&#x25cb;</label>
                        <p>Detection Rate (Recall): This rate also known as the True Positive Rate (TPR) is a metric that measures the rate of true malicious flows that are detected, and it calculate based on the following equation:
                            <disp-formula id="e2">

                                <mml:math display="block">
                                    <mml:mtext>Detection&#x2009;Rate</mml:mtext>
                                    <mml:mspace width="0.12em"/>
                                    <mml:mrow>
                                        <mml:mo stretchy="true">(</mml:mo>
                                        <mml:mtext>Recall</mml:mtext>
                                        <mml:mo stretchy="true">)</mml:mo>
                                    </mml:mrow>
                                    <mml:mo>=</mml:mo>
                                    <mml:mfrac>
                                        <mml:mi mathvariant="italic">TP</mml:mi>
                                        <mml:mrow>
                                            <mml:mi mathvariant="italic">TP</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">FN</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                </mml:math>
</disp-formula>
                        </p>
                    </list-item>
                </list>

                <list list-type="bullet">
                    <list-item>
                        <label>&#x25cb;</label>
                        <p>Precision: Refers to the ratio of the flows that are predicted to be malicious which actually are malicious, and it calculate based on the following equation:
                            <disp-formula id="e3">

                                <mml:math display="block">
                                    <mml:mtext>Precision</mml:mtext>
                                    <mml:mo>=</mml:mo>
                                    <mml:mfrac>
                                        <mml:mi mathvariant="italic">TP</mml:mi>
                                        <mml:mrow>
                                            <mml:mi mathvariant="italic">TP</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">FP</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                </mml:math>
</disp-formula>
                        </p>
                    </list-item>
                </list>

                <list list-type="bullet">
                    <list-item>
                        <label>&#x25cb;</label>
                        <p>F1 Score: Is the harmonic mean of Precision and Recall that provides a balanced assessment when there is a tradeoff between the two measures, and it calculate based on the following equation:
                            <disp-formula id="e4">

                                <mml:math display="block">
                                    <mml:mi mathvariant="normal">F</mml:mi>
                                    <mml:mn>1</mml:mn>
                                    <mml:mspace width="0.12em"/>
                                    <mml:mtext>Score</mml:mtext>
                                    <mml:mo>=</mml:mo>
                                    <mml:mn>2</mml:mn>
                                    <mml:mo>&#x00d7;</mml:mo>
                                    <mml:mfrac>
                                        <mml:mrow>
                                            <mml:mtext>Precision</mml:mtext>
                                            <mml:mo>&#x00d7;</mml:mo>
                                            <mml:mtext>Recall</mml:mtext>
                                        </mml:mrow>
                                        <mml:mrow>
                                            <mml:mtext>Precision</mml:mtext>
                                            <mml:mo>+</mml:mo>
                                            <mml:mtext>Recall</mml:mtext>
                                        </mml:mrow>
                                    </mml:mfrac>
                                </mml:math>
</disp-formula>
                        </p>
                    </list-item>
                </list>

                <list list-type="bullet">
                    <list-item>
                        <label>&#x25cb;</label>
                        <p>False Positive Rate (FPR): This value is used to estimate the ratio of false positive results of disproving benign flows, and it calculate based on the following equation:
                            <disp-formula id="e5">

                                <mml:math display="block">
                                    <mml:mi>FPR</mml:mi>
                                    <mml:mo>=</mml:mo>
                                    <mml:mfrac>
                                        <mml:mi mathvariant="italic">FP</mml:mi>
                                        <mml:mrow>
                                            <mml:mi mathvariant="italic">FP</mml:mi>
                                            <mml:mo>+</mml:mo>
                                            <mml:mi mathvariant="italic">TN</mml:mi>
                                        </mml:mrow>
                                    </mml:mfrac>
                                </mml:math>
</disp-formula>
                        </p>
                    </list-item>
                </list>
            </p>
            <p>The identified enhancement of the performance with the help of RNA encoding can be explained by the capacity of the solution to increase the feature representation in the CNN framework. RNA encoding converts the input data into structured and biologically inspired encoding, which adds further diversity and non-linearity to the feature space. This transformation allows the network to pick up more complex and discriminative patterns that might not be available with traditional encoding methods. Additionally, RNA encoding also helps in reducing noise and enhancing generalization as it focuses on meaningful relationships in the data. Due to this, the CNN can develop more resilient features resulting in higher classification accuracy and system performance. The Malicious Network Dataset is used for evaluation the proposed method, where each method (signature, anomaly, or hybrid) is starting by preprocessing the used dataset by removing IP fields, then RNA encoding is applied to all remaining features. Then divided the dataset into training and testing, the training is equal to 70%, while the testing used the rest 30% from the whole dataset. The achieved results for the first method (signature-CNN) are shown in 
                <xref ref-type="table" rid="T4">
Table 4</xref>. The Signature-CNN was very accurate and reported a low false-positive rate, thus, justifying its accuracy in identifying known malware.</p>
            <table-wrap id="T4" orientation="portrait" position="float">
                <label>
Table 4. </label>
                <caption>
                    <title>The performance of the Signature-CNN model.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Metric</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">
Result</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Accuracy</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.915</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Detection Rate</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.89</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Precision</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">F1 Score</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.90</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">FPR</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.05</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>
While the obtained results based on the second method (anomaly-CNN) are shown in 
                <xref ref-type="table" rid="T5">
Table 5</xref>. The Anomaly-CNN was able to achieve higher rate of detection, which revealed that zero-day threats can be spotted with a slight increase in false positives
.</p>
            <table-wrap id="T5" orientation="portrait" position="float">
                <label>
Table 5. </label>
                <caption>
                    <title>The performance of the Anomaly-CNN model.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Metric</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">
Result</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Accuracy</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Detection Rate</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Precision</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.91</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">F1 Score</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">FPR</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.07</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>On other hand, when utilized the third method (hybrid-CNN), the achieved results are shown in 
                <xref ref-type="table" rid="T6">
Table 6</xref>. Hybrid-CNN delivered the best trade-off, achieving the best detection rate and F1 score whilst reducing false positives at the same time.</p>
            <table-wrap id="T6" orientation="portrait" position="float">
                <label>
Table 6. </label>
                <caption>
                    <title>The performance of the Hybrid-CNN model.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Metric</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">
Result</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Accuracy</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.95</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Detection Rate</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.95</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Precision</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.94</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">F1 Score</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.945</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">FPR</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.03</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>Finally, the comparison between the performance of all models (signature, anomaly, and hybrid) are shown in 
                <xref ref-type="table" rid="T7">
Table 7</xref> and 
                <xref ref-type="fig" rid="f3">
Figure 3</xref>.</p>
            <table-wrap id="T7" orientation="portrait" position="float">
                <label>
Table 7. </label>
                <caption>
                    <title>Performance of CNN models.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Method</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Accuracy</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Detection rate</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Precision</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">F1 score</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">
FPR</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Signature-CNN
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.91</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.89</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.905</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.05</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Anomaly-CNN
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.91</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.07</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Hybrid-CNN
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.95</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.95</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.94</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.945</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.03</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                <label>
Figure 3. </label>
                <caption>
                    <title>Comparison of Signature-CNN, Anomaly-CNN and Hybrid-CNN on the malicious network dataset.</title>
                </caption>
                <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/199234/eab33da7-608f-4884-b7fe-0961fb4aa37e_figure3.gif"/>
            </fig>
            <p>
As shown in 
                <xref ref-type="table" rid="T7">
Table 7</xref>, the achieved results based on hybrid-CNN are achieved the best results over the two others methods, where the obtained accuracy, detection rate, precision, F1 score, and FPR are equal to 95%, 95%, 94%, 94.5%, and 3% respectively. The Hybrid-CNN may be explained by the possibility of making decisions in two stages. The first stage filters familiar malicious patterns and the latter extrapolates to unknown codon patterns. This is a combination that reduces the antagonism of sensitivity and specificity. As compared to the Anomaly- CNN, however, it has a slightly higher recall with more false positives since it does not identify exact patterns but only the deviations. The Signature-CNN is also accurate to known dangers but does not have generalization and this is the reason behind its slightly lower recall.</p>
            <p>The proposed method achieved results are compared with classical machine learning models (Random Forest, and XGBoost) and deep models (RNN, CNN-BiLSTM and AE-LSTM) and this comparison is shown in 
                <xref ref-type="table" rid="T8">
Table 8</xref>.</p>
            <table-wrap id="T8" orientation="portrait" position="float">
                <label>
Table 8. </label>
                <caption>
                    <title>Comparison of proposed models with existing baselines.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Accuracy</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Detection rate</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">Precision</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">F1 score</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">
FPR</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Random Forest</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.87</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.84</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.86</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.85</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.09</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">XGBoost</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.89</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.87</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.88</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.875</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.08</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">RNN</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.91</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.89</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.90</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.895</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.07</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">LSTM</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.90</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.91</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.905</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.06</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">CNN-BiLSTM (Wang et al., 2024)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.92</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.05</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">Hybrid AE-LSTM (Xue et al., 2025)</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.94</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.93</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">0.04</td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>Hybrid-CNN (Proposed)</bold>
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>0.95</bold>
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>0.95</bold>
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>0.94</bold>
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>0.945</bold>
</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <bold>0.03</bold>
</td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
            <p>As shown in 
                <xref ref-type="table" rid="T8">
Table 8</xref>, The Hybrid-CNN achieved better results than the traditional and deep learning methods. Where the proposed method achieved the highest accuracy, detection rate, precision, and F1-score, where these results are equal to 95%, 95%, 94%, and 94.5% respectively. Also, the obtained FPR results are the lowest and equal to 3%. This has been enhanced by the fact that it has RNA encoding that maintains semantical links between traffic features and increases the pattern recognition capability of the CNN.</p>
            <sec id="sec11">
                <title>3.1 Computational efficiency</title>
                <p>The experiments were being carried out on the NVIDIA RTX-4090 graphics card with 24 GB VRAM. The Signatures CNN took 1.8 hours for training, Anomaly CNN 2.3 hours and the Hybrid CNN 2.7 hours to training. The mean inference latency (per flow) was 2.1 ms, 2.4 ms and 3.0 ms respectively. Despite the fact that Hybrid-CNN is the most expensive in terms of computation since it involves two stages of evaluation, it is still capable of deployment in near-real-time and takes much less time than recurrent models, including LSTM and AE-LSTM.</p>
            </sec>
        </sec>
        <sec id="sec12" sec-type="conclusion">
            <title>4. Conclusion</title>
            <p>This work proposed an integrated malware detector model based on CNN, which was built on RNA encoding and implemented on Malicious Network Dataset. Restructuring signature, anomaly, and hybrid detection as CNN-based paradigms, the system achieves strong performance across all detection modes. The Hybrid-CNN achieved the best results, having 95% of detection, and the same time, minimized false-positive risks. Future directions will be to extrapolate the proposed technique to bigger and more heterogeneous datasets to further test the generalization capability of the technique. Moreover, hybrid deep learning models, including CNN architecture and the use of transformer-based methods, will be considered to improve feature learning. The other valuable direction is the optimization of the model to real-time applications and minimization of the complexity of the computation. Also, the exploration of other bio-inspired encoding methods can offer further enhancements in feature representation and efficiency of the model.</p>
        </sec>
    </body>
    <back>
        <sec id="sec15" sec-type="data-availability">
            <title>Data availability</title>
            <p>Repository name: Malicious Network Dataset. Zenodo. 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.15453468">https://doi.org/10.5281/zenodo.15453468</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref15">15</xref>
                </sup>
            </p>
            <p>This study uses a publicly available dataset that was originally published by Saadoon and Behadili (2024). The authors did not generate the dataset themselves. The repository contains all underlying data required to reproduce the results reported in this article, including raw network flow records labeled as benign or malicious and all variables used in the experiments (protocol type, port numbers, hash values, payload length, encoded payload data, and class labels). The dataset is openly accessible and released under an open license permitting reuse, with no embargo or access restrictions.</p>
            <p>Data are available under the terms of the 
                <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
        </sec>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alrayes</surname>
                            <given-names>FS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Amin</surname>
                            <given-names>SU</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hakami</surname>
                            <given-names>NA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Intrusion Detection Model on Network Data with Deep Adaptive Multi-Layer Attention Network (DAMLAN).</article-title>
                    <source>

                        <italic toggle="yes">CMES - Computer Modeling in Engineering and Sciences.</italic>
</source>
                    <year>2025</year>;<volume>144</volume>(<issue>1</issue>):<fpage>581</fpage>&#x2013;<lpage>614</lpage>.
                    <pub-id pub-id-type="doi">10.32604/cmes.2025.065188</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zou</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hu</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A network intrusion detection system based on self-supervised learning of traffic differentiation in Internet of Things.</article-title>
                    <source>

                        <italic toggle="yes">Eng. Appl. Artif. Intell.</italic>
</source>
                    <year>2025</year>;<volume>160</volume>:<fpage>111973</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.engappai.2025.111973</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rashid</surname>
                            <given-names>OF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Othman</surname>
                            <given-names>ZA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zainudin</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Matching algorithms for intrusion detection system based on DNA encoding.</article-title>
                    <source>

                        <italic toggle="yes">J. Theor. Appl. Inf. Technol.</italic>
</source>
                    <year>2018</year>;<volume>96</volume>(<issue>24</issue>):<fpage>8410</fpage>&#x2013;<lpage>8420</lpage>.</mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hossain</surname>
                            <given-names>MA</given-names>
                        </name>
</person-group>:
                    <article-title>FED-GEM-CN: A federated dual-CNN architecture with contrastive cross-attention for maritime radar intrusion detection.</article-title>
                    <source>

                        <italic toggle="yes">Array.</italic>
</source>
                    <year>2025</year>;<volume>27</volume>:<fpage>100456</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.array.2025.100456</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>NIDS-CNNRF integrating CNN and random forest for efficient network intrusion detection model.</article-title>
                    <source>

                        <italic toggle="yes">Internet of Things.</italic>
</source>
                    <year>2025</year>;<volume>32</volume>.
                    <pub-id pub-id-type="doi">10.1016/j.iot.2025.101607</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xue</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yu</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>HAE-HRL: A network intrusion detection system utilizing a novel autoencoder and a hybrid enhanced LSTM-CNN-based residual network.</article-title>
                    <source>

                        <italic toggle="yes">Comput. Secur.</italic>
</source>
                    <year>2025</year>;<volume>151</volume>:<fpage>104328</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.cose.2025.104328</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kaissar</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bou Nassif</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soudan</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enhancing CNN-based network intrusion detection through hyperparameter optimization.</article-title>
                    <source>

                        <italic toggle="yes">Intelligent Systems with Applications.</italic>
</source>
                    <year>2025</year>;<volume>26</volume>:<fpage>200528</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.iswa.2025.200528</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alrayes</surname>
                            <given-names>FS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zakariah</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Amin</surname>
                            <given-names>SU</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CNN Channel Attention Intrusion Detection System Using NSL-KDD Dataset, Computers.</article-title>
                    <source>

                        <italic toggle="yes">Materials and Continua.</italic>
</source>
                    <year>2024</year>;<volume>79</volume>(<issue>3</issue>):<fpage>4319</fpage>&#x2013;<lpage>4347</lpage>.
                    <pub-id pub-id-type="doi">10.32604/cmc.2024.050586</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dai</surname>
                            <given-names>Q</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Du</surname>
                            <given-names>T</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Lightweight intrusion detection model based on CNN and knowledge distillation.</article-title>
                    <source>

                        <italic toggle="yes">Appl. Soft Comput.</italic>
</source>
                    <year>2024</year>;<volume>165</volume>:<fpage>112118</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.asoc.2024.112118</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ban</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>He</surname>
                            <given-names>Q</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>APSO-CNN-SE: An Adaptive Convolutional Neural Network Approach for IoT Intrusion Detection, Computers.</article-title>
                    <source>

                        <italic toggle="yes">Materials and Continua.</italic>
</source>
                    <year>2024</year>;<volume>81</volume>(<issue>1</issue>):<fpage>567</fpage>&#x2013;<lpage>601</lpage>.
                    <pub-id pub-id-type="doi">10.32604/cmc.2024.055007</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Si</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Z</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A New Industrial Intrusion Detection Method Based on CNN-BiLSTM, Computers.</article-title>
                    <source>

                        <italic toggle="yes">Materials and Continua.</italic>
</source>
                    <year>2024</year>;<volume>79</volume>(<issue>3</issue>):<fpage>4297</fpage>&#x2013;<lpage>4318</lpage>.
                    <pub-id pub-id-type="doi">10.32604/cmc.2024.050223</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Altunay</surname>
                            <given-names>HC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Albayrak</surname>
                            <given-names>Z</given-names>
                        </name>
</person-group>:
                    <article-title>A hybrid CNN+LSTM-based intrusion detection system for industrial IoT networks, Engineering Science and Technology.</article-title>
                    <source>

                        <italic toggle="yes">An International Journal.</italic>
</source>
                    <year>2023</year>;<volume>38</volume>:<fpage>101322</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.jestch.2022.101322</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hou</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xing</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liang</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Network intrusion detection based on DNA spatial information.</article-title>
                    <source>

                        <italic toggle="yes">Comput. Netw.</italic>
</source>
                    <year>2022</year>;<volume>217</volume>:<fpage>109318</fpage>.
                    <pub-id pub-id-type="doi">10.1016/j.comnet.2022.109318</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Subhi</surname>
                            <given-names>MA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rashid</surname>
                            <given-names>OF</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Abdulsahib</surname>
                            <given-names>SA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Anomaly Intrusion Detection Method based on RNA Encoding and ResNet50 Model.</article-title>
                    <source>

                        <italic toggle="yes">Mesopotamian Journal of CyberSecurity.</italic>
</source>
                    <year>2024</year>;<volume>4</volume>(<issue>2</issue>):<fpage>120</fpage>&#x2013;<lpage>128</lpage>.
                    <pub-id pub-id-type="doi">10.58496/MJCS/2024/011</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Saadoon</surname>
                            <given-names>MS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Behadili</surname>
                            <given-names>SF</given-names>
                        </name>
</person-group>:
                    <data-title>malicious network dataset.</data-title>[Data set].
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2024</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.15453468</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report483152">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.199234.r483152</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zaidi</surname>
                        <given-names>Atif Raza</given-names>
                    </name>
                    <xref ref-type="aff" rid="r483152a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0824-3165</uri>
                </contrib>
                <aff id="r483152a1">
                    <label>1</label>TIMES University, Multan, Pakistan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>5</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Zaidi AR</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport483152" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.173837.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The manuscript presents an interesting and relevant study on malware detection using RNA-inspired encoding with CNN-based models. The topic is relevant to network security and malware detection. The proposed approach is interesting, and the reported results are encouraging. However, the following suggestions are offered to further improve technical validation of the work.</p>
            <p> </p>
            <p> 1. The RNA-inspired encoding is a central contribution of the paper. Therefore, an ablation experiment comparing CNN with RNA encoding and CNN without RNA encoding would help show the direct impact of the proposed encoding scheme.</p>
            <p> 2. The architecture description mentions a single sigmoid output neuron, while the training description refers to categorical cross-entropy loss. The authors should clarify the exact loss function used and ensure that it is consistent with the output layer configuration.</p>
            <p> 3. The manuscript mentions that the dataset was sampled using a 50/50 malicious-benign ratio. However, the exact number of benign and malicious samples should be clearly reported.</p>
            <p> 4. The results would be more transparent if confusion matrices were added for Signature-CNN, Anomaly-CNN, and Hybrid-CNN. This would help readers directly observe TP, TN, FP, and FN values behind the reported metrics.</p>
            <p> 5. The authors mention that Python and deep learning libraries were used, but the specific tools and library versions are not clearly provided. It would be useful to mention details such as TensorFlow.</p>
            <p> 6. Although the manuscript has improved, a final proofreading pass is still recommended to address minor grammatical and stylistic issues. For example, some phrases such as &#x201c;built by combining of,&#x201d; and &#x201c;Where the mapping rules as follow&#x201d; may be revised for smoother academic readability.</p>
            <p> 7. The paper already mentions training settings such as epochs, batch size, optimizer, learning rate, dropout, and early stopping. As an optional improvement, the authors may also consider adding training and validation accuracy/loss curves to show model convergence and overfitting behavior.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Cybersecurity, Android malware detection, machine learning, and deep learning</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment16298-483152">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Fitian Rashid</surname>
                            <given-names>Omar</given-names>
                        </name>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>26</day>
                    <month>5</month>
                    <year>2026</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>Reviewer Comments</bold>
                </p>
                <p> </p>
                <p> 
                    <bold>Comment 1:</bold> 
                    <bold>The RNA-inspired encoding is a central contribution of the paper. Therefore, an ablation experiment comparing CNN with RNA encoding and CNN without RNA encoding would help show the direct impact of the proposed encoding scheme.</bold>
                </p>
                <p> 
                    <bold>Response: </bold>Thank you for this valuable suggestion. An ablation study comparing the CNN model with and without RNA encoding has now been added to better demonstrate the contribution of the proposed RNA-inspired encoding scheme. The additional experiment confirms that RNA encoding improves feature representation and enhances malware detection performance across all evaluation metrics.</p>
                <p> </p>
                <p> 
                    <bold>Comment 2: </bold>
                    <bold>The architecture description mentions a single sigmoid output neuron, while the training description refers to categorical cross-entropy loss. The authors should clarify the exact loss function used and ensure that it is consistent with the output layer configuration.</bold>
                </p>
                <p> 
                    <bold>Response:</bold> Thank you for identifying this inconsistency. The manuscript has been revised to clarify that the models use a single sigmoid output neuron with binary cross-entropy loss because the task is binary malware classification.</p>
                <p> </p>
                <p> 
                    <bold>Comment 3:</bold> 
                    <bold>The manuscript mentions that the dataset was sampled using a 50/50 malicious-benign ratio. However, the exact number of benign and malicious samples should be clearly reported.</bold>
                </p>
                <p> 
                    <bold>Response:</bold> Thank you for this observation. The manuscript has been revised to explicitly report the number of benign and malicious samples used in the experiments.</p>
                <p> </p>
                <p> 
                    <bold>Comment 4: </bold>
                    <bold>The results would be more transparent if confusion matrices were added for Signature-CNN, Anomaly-CNN, and Hybrid-CNN. This would help readers directly observe TP, TN, FP, and FN values behind the reported metrics.</bold>
                </p>
                <p> 
                    <bold>Response: </bold>Thank you for this constructive recommendation. Confusion matrices for Signature-CNN, Anomaly-CNN, and Hybrid-CNN have now been added to provide a clearer representation of TP, TN, FP, and FN values.</p>
                <p> </p>
                <p> </p>
                <p> </p>
                <p> 
                    <bold>Comment 5:</bold> 
                    <bold>The authors mention that Python and deep learning libraries were used, but the specific tools and library versions are not clearly provided. It would be useful to mention details such as TensorFlow.</bold>
                </p>
                <p> 
                    <bold>Response: </bold>Thank you for the helpful suggestion. The implementation details have now been expanded to include the software environment and deep learning libraries used in the experiments.</p>
                <p> </p>
                <p> </p>
                <p> 
                    <bold>Comment 6:</bold> 
                    <bold>Although the manuscript has improved, a final proofreading pass is still recommended to address minor grammatical and stylistic issues. For example, some phrases such as &#x201c;built by combining of,&#x201d; and &#x201c;Where the mapping rules as follow&#x201d; may be revised for smoother academic readability.</bold>
                </p>
                <p> 
                    <bold>Response: </bold>Thank you for the careful review. The manuscript has undergone an additional proofreading pass to improve grammatical accuracy, sentence clarity, and academic readability.</p>
                <p> </p>
                <p> </p>
                <p> 
                    <bold>Comment 7: </bold>
                    <bold>The paper already mentions training settings such as epochs, batch size, optimizer, learning rate, dropout, and early stopping. As an optional improvement, the authors may also consider adding training and validation accuracy/loss curves to show model convergence and overfitting behavior.</bold>
                </p>
                <p> 
                    <bold>Response: </bold>Thank you for the careful review. The manuscript has undergone an additional proofreading pass to improve grammatical accuracy, sentence clarity, and academic readability.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report474768">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.191689.r474768</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Subhi</surname>
                        <given-names>Mohammed</given-names>
                    </name>
                    <xref ref-type="aff" rid="r474768a1">1</xref>
                    <xref ref-type="aff" rid="r474768a2">2</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r474768a1">
                    <label>1</label>Directorate of Private University Education,, Baghdad, 10011,, Iraq</aff>
                <aff id="r474768a2">
                    <label>2</label>Department of Cybersecurity/ Directorate of Information Technology, Iraqi Ministry of Higher Education and Scientific Research, Baghdad, Baghdad, Iraq</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>4</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Subhi M</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport474768" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.173837.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The manuscript presents an interesting and timely study on malware detection using RNA encoding combined with CNN architectures. The idea of integrating biologically inspired encoding with deep learning models is promising and shows potential for improving detection performance. However, several issues should be addressed to enhance the clarity, rigor, and overall quality of the manuscript.</p>
            <p> 1.&#x00a0;&#x00a0; &#x00a0;The abstract is generally well-structured; however, it would benefit from including additional details such as the dataset size and key aspects of the experimental setup to improve completeness.&#x00a0;</p>
            <p> 2.&#x00a0;&#x00a0; &#x00a0;The introduction provides a relevant background, but the manuscript requires careful language editing to correct grammatical issues and improve readability.&#x00a0;</p>
            <p> 3.&#x00a0;&#x00a0; &#x00a0;The related work section is comprehensive; however, it would be strengthened by adding a more critical analysis that clearly identifies the limitations of existing approaches and positions the proposed method accordingly.&#x00a0;</p>
            <p> 4.&#x00a0;&#x00a0; &#x00a0;The description of the CNN architecture is clear, but important training details such as optimizer type, learning rate, batch size, and number of epochs are missing and should be provided.&#x00a0;</p>
            <p> 5.&#x00a0;&#x00a0; &#x00a0;The discussion section would benefit from deeper insights into why RNA encoding enhances CNN performance, particularly from a feature representation perspective.&#x00a0;</p>
            <p> 6.&#x00a0;&#x00a0; &#x00a0;The conclusion is appropriate, but the future work section could be made more specific by outlining concrete and actionable research directions.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>No source data required</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Machine leaarning, artificial intelligience, Iot applications, and AI in security applications</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment16006-474768">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Fitian Rashid</surname>
                            <given-names>Omar</given-names>
                        </name>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>21</day>
                    <month>4</month>
                    <year>2026</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>Reviewer Comments</bold>
                </p>
                <p> 
                    <bold>Comment 1:</bold>
                </p>
                <p> The abstract is generally well-structured; however, it would benefit from including additional details such as the dataset size and key aspects of the experimental setup to improve completeness.</p>
                <p> 
                    <bold>Response:</bold>
                </p>
                <p> Thank you for this valuable suggestion. The abstract has been revised to include the dataset size and key elements of the experimental setup, including the type of data used and evaluation protocol.</p>
                <p> </p>
                <p> 
                    <bold>Comment 2:</bold>
                </p>
                <p> The introduction provides a relevant background, but the manuscript requires careful language editing to correct grammatical issues and improve readability.</p>
                <p> 
                    <bold>Response:</bold>
                </p>
                <p> We appreciate this observation. The manuscript has undergone thorough language editing to improve grammar, sentence structure, and overall readability.</p>
                <p> </p>
                <p> 
                    <bold>Comment 3:</bold>
                </p>
                <p> The related work section is comprehensive; however, it would be strengthened by adding a more critical analysis that clearly identifies the limitations of existing approaches and positions the proposed method accordingly.</p>
                <p> 
                    <bold>Response:</bold>
                </p>
                <p> Thank you for this insightful comment. A critical analysis has been incorporated to highlight the limitations of existing methods and to clearly position the novelty and advantages of the proposed approach.</p>
                <p> </p>
                <p> 
                    <bold>Comment 4:</bold>
                </p>
                <p> The description of the CNN architecture is clear, but important training details such as optimizer type, learning rate, batch size, and number of epochs are missing and should be provided.</p>
                <p> 
                    <bold>Response:</bold>
                </p>
                <p> We appreciate this important suggestion. The training configuration details have been added to ensure reproducibility of the proposed method.</p>
                <p> 
                    <bold>Comment 5:</bold>
                </p>
                <p> The discussion section would benefit from deeper insights into why RNA encoding enhances CNN performance, particularly from a feature representation perspective.</p>
                <p> 
                    <bold>Response:</bold>
                </p>
                <p> Thank you for this constructive feedback. The discussion has been expanded to provide deeper insights into how RNA encoding improves feature representation and contributes to enhanced CNN performance.</p>
                <p> </p>
                <p> 
                    <bold>Comment 6:</bold>
                </p>
                <p> The conclusion is appropriate, but the future work section could be made more specific by outlining concrete and actionable research directions.</p>
                <p> 
                    <bold>Response:</bold>
                </p>
                <p> We appreciate this suggestion. The future work section has been revised to include specific and actionable research directions.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
