<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.73082.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Encoding Retina Image to Words using Ensemble of Vision Transformers for Diabetic Retinopathy Grading</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>AlDahoul</surname>
                        <given-names>Nouar</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5522-0033</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Abdul Karim</surname>
                        <given-names>Hezerul</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7613-4596</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Joshua Toledo Tan</surname>
                        <given-names>Myles</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Momo</surname>
                        <given-names>Mhd Adel</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6220-9801</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ledesma Fermin</surname>
                        <given-names>Jamie</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Engineering, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia</aff>
                <aff id="a2">
                    <label>2</label>Artificial Intelligence, YO-VIVO corporation, Bacolod City, 6100, Philippines</aff>
                <aff id="a3">
                    <label>3</label>Engineering and Technology, University of St. La Salle, Bacolod City, 6100, Philippines</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:nouar.aldahoul@live.iium.edu.my">nouar.aldahoul@live.iium.edu.my</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>9</month>
                <year>2021</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2021</year>
            </pub-date>
            <volume>10</volume>
            <elocation-id>948</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>13</day>
                    <month>9</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 AlDahoul N et al.</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/10-948/pdf"/>
            <abstract>
                <p>Diabetes is one of the top ten causes of death among adults worldwide. People with diabetes are prone to suffer from eye disease such as diabetic retinopathy (DR). DR damages the blood vessels in the retina and can result in vision loss. DR grading is an essential step to take to help in the early diagnosis and in the effective treatment thereof, and also to slow down its progression to vision impairment. Existing automatic solutions are mostly based on traditional image processing and machine learning techniques. Hence, there is a big gap when it comes to more generic detection and grading of DR. Various deep learning models such as convolutional neural networks (CNNs) have been previously utilized for this purpose. To enhance DR grading, this paper proposes a novel solution based on an ensemble of state-of-the-art deep learning models called vision transformers. A challenging public DR dataset proposed in a 2015 Kaggle challenge was used for training and evaluation of the proposed method. This dataset includes highly imbalanced data with five levels of severity: No DR, Mild, Moderate, Severe, and Proliferative DR. The experiments conducted showed that the proposed solution outperforms existing methods in terms of precision (47%), recall (45%), F1 score (42%), and Quadratic Weighted Kappa (QWK) (60.2%). Finally, it was able to run with low inference time (1.12 seconds). For this reason, the proposed solution can help examiners grade DR more accurately than manual means.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Diabetic Retinopathy Grading</kwd>
                <kwd>Ensemble Learning</kwd>
                <kwd>Imbalanced Data</kwd>
                <kwd>Vision Transformer</kwd>
                <kwd>Self-attention Mechanism</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Multimedia University, Malaysia</funding-source>
                </award-group>
                <funding-statement>This research project was funded by Multimedia University, Malaysia.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <sec id="sec2">
                <title>Background</title>
                <p>Diabetes mellitus (DM) is a group of metabolic disorders that are characterized by high levels of blood glucose and are caused by either the deficient secretion of the hormone insulin, its inaction, or both. Chronically high levels of glucose in the blood that come with DM may bring about long-term damage to several different organs, such as the eyes.
                    <sup>
                        <xref ref-type="bibr" rid="ref1">1</xref>,
                        <xref ref-type="bibr" rid="ref2">2</xref>
                    </sup> DM is a pandemic of great concern
                    <sup>
                        <xref ref-type="bibr" rid="ref3">3</xref>-
                        <xref ref-type="bibr" rid="ref6">6</xref>
                    </sup> as approximately 463 million adults were living with DM in 2019. This number is expected to rise to about 700 million by the year 2045.
                    <sup>
                        <xref ref-type="bibr" rid="ref4">4</xref>
                    </sup>
                </p>
                <p>High levels of glucose in the blood damage the capillaries of the retina (diabetic retinopathy [DR]) or the optic nerve (glaucoma), cloud the lens (cataract), or cause fluid to build up in the macula (diabetic macular edema), thereby causing diabetic eye disease.
                    <sup>
                        <xref ref-type="bibr" rid="ref6">6</xref>-
                        <xref ref-type="bibr" rid="ref11">11</xref>
                    </sup> DR is the leading cause of blindness among adults in the working age
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup> and has brought about several personal and socioeconomic consequences,
                    <sup>
                        <xref ref-type="bibr" rid="ref13">13</xref>
                    </sup> and a greater risk of developing other complications of DM and of dying.
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> According to a meta-analysis that reviewed 35 studies worldwide from 1980 to 2008, 34.6% of all patients with DM globally have DR of some form, while 10.2% of all patients with DM have vision-threatening DR.
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup>
                </p>
                <p>A study found that screening for DR and the early treatment thereof could lower the risk of vision loss by about 56%,
                    <sup>
                        <xref ref-type="bibr" rid="ref16">16</xref>
                    </sup> proving that blindness due to DR is highly preventable. Moreover, the 
                    <uri xlink:href="http://www.vision2020australia.org.au/uploads/resource/108/Universal-Eye-Health-A-Global-Action-Plan-2014-2019.pdf">World Health Organization (WHO</uri>) Universal Eye Health: A Global Action Plan 2014&#x2013;2019 advocated for efforts to reduce the prevalence of preventable visual impairments and blindness including those that arise as complications of DM.</p>
                <p>Many tests can be used for the screening of DR. While sensitivity and specificity are certainly important, the data about performance of tests for DR are different. Researchers employ different outcomes to measure sensitivity, 
                    <italic toggle="yes">e.g.</italic>, the ability of a screening test to detect any form of retinopathy, and the ability to detect vision-threatening DR. Additionally, some tests may detect diabetic macular edema better than the different grades of DR according to 
                    <uri xlink:href="https://apps.who.int/iris/bitstream/handle/10665/336660/9789289055321-eng.pdf">World Health Organization. Diabetic retinopathy screening: a short guide. Copenhagen: WHO Regional Office for Europe</uri>. The examiner&#x2019;s skill is also a source of variation in the test results. A systematic review found that the sensitivity of direct ophthalmoscopy (DO) varies greatly when performed by general practitioners (25%&#x2013;66%) and by ophthalmologists (43%&#x2013;79%).
                    <sup>
                        <xref ref-type="bibr" rid="ref17">17</xref>
                    </sup>
                </p>
                <p>DR grading is an essential step in the early diagnosis and effective treatment of the disease. Manual grading is based on high-resolution retinal images examined by a clinician. However, the process is time-consuming and is prone to misdiagnosis. This paper aims to address the matter by developing a fast and accurate automated DR grading system. Here, a novel solution that is based on an ensemble of vision transformers was proposed to enhance grading. Moreover, a public DR dataset proposed in a 2015 
                    <uri xlink:href="https://www.kaggle.com/c/diabetic-retinopathy-detection/">Kaggle challenge</uri> was used for the training and evaluation.</p>
            </sec>
            <sec id="sec3">
                <title>Related work</title>
                <p>Traditional machine learning (ML) methods have been used to detect DR. Typically, these ML methods require hand-tuned features extracted from small datasets to aid in classification. These traditional methods may involve ensemble learning
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup>; the calculation of the mean, standard deviation, and edge strength
                    <sup>
                        <xref ref-type="bibr" rid="ref19">19</xref>
                    </sup>; and the segmentation of hard macular exudates.
                    <sup>
                        <xref ref-type="bibr" rid="ref20">20</xref>,
                        <xref ref-type="bibr" rid="ref21">21</xref>
                    </sup> However, these methods require tedious and time-consuming feature engineering steps that are sensitive to the chosen set of features. Work that employs traditional ML methods to detect DR usually yield favorable results using one dataset but fail to obtain a similar success when another dataset is used.
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>,
                        <xref ref-type="bibr" rid="ref19">19</xref>
                    </sup> This is a common limitation of hand-crafted features.</p>
                <p>Deep neural networks, such as CNNs, with much larger datasets have also been used for classification tasks in the diagnosis and grading of DR. These methods involve CNNs developed from scratch to grade the disease using images of the retinal fundus
                    <sup>
                        <xref ref-type="bibr" rid="ref22">22</xref>
                    </sup>; transfer learning based on Inception-v3 neural network to perform multiple binary classification (moderate 
                    <italic toggle="yes">versus</italic> worse DR, and severe or worse DR)
                    <sup>
                        <xref ref-type="bibr" rid="ref23">23</xref>
                    </sup>; segmentation prior to detection by pixel classification
                    <sup>
                        <xref ref-type="bibr" rid="ref24">24</xref>
                    </sup> or patch classification.
                    <sup>
                        <xref ref-type="bibr" rid="ref25">25</xref>
                    </sup> A deep learning (DL)-based framework that uses advanced image processing and a boosting algorithm for grading of DR was also proposed by.
                    <sup>
                        <xref ref-type="bibr" rid="ref26">26</xref>
                    </sup> This is one of only a handful of works that have effectively employed transfer learning to train large neural networks for this purpose. Recently, ResNet, a deep CNN, was proposed to address the problem brought about by imbalanced datasets in DR grading.
                    <sup>
                        <xref ref-type="bibr" rid="ref27">27</xref>
                    </sup> Additionally, a bagging ensemble of three CNNs: a shallow CNN, VGG16, and InceptionV3, was used to classify images as DR, glaucoma, myopia and normal.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">28</xref>
                    </sup>
                </p>
                <p>Previously, a transformer was also proposed by Vaswani 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref29">29</xref>
                    </sup> for natural language processing tasks especially for machine translation. Inspired by the successes of the transformers in NLP, transformers were transferred to computer vision tasks 
                    <italic toggle="yes">e.g.</italic> image classification.</p>
            </sec>
        </sec>
        <sec id="sec4" sec-type="methods">
            <title>Methods</title>
            <p>In this section, the DR detection dataset is explored. Additionally, the vision transformer, a DL model that was used on these data, is discussed in detail.</p>
            <sec id="sec5">
                <title>Dataset overview</title>
                <p>The 
                    <uri xlink:href="https://www.kaggle.com/c/diabetic-retinopathy-detection/data">DR detection dataset</uri> is highly imbalanced and consists of high-resolution images with five levels of severity including No_DR, Mild, Moderate, Severe, and Proliferative_DR. It has significantly more samples for the negative (No_DR) category than for the four positive categories. 
                    <xref ref-type="table" rid="T1">Table 1</xref> shows the class distribution of the training and testing sets. 
                    <xref ref-type="fig" rid="f1">Figure 1</xref>, on the other hand, shows a few samples from each class. The images come with different conditions and were labeled with subject IDs. The left and right fields were provided for every subject. The images were captured by different cameras, thus affecting the visual appearance of the images of left and right eyes.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>A few samples of each class in dataset of 
                            <uri xlink:href="https://www.kaggle.com/c/diabetic-retinopathy-detection/data">EyePACS, Diabetic Retinopathy Detection</uri>: &#x201c;No_DR&#x201d; (red borders), &#x201c;Mild&#x201d; (blue), &#x201c;Moderate&#x201d; (green), &#x201c;Severe&#x201d; (yellow), &#x201c;Proliferative_DR&#x201d; (violet).</title>
                        <p>The images have various sizes but were resized uniformly.</p>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure1.gif"/>
                </fig>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>Table 1. </label>
                    <caption>
                        <title>Training and Testing Class Distribution in dataset of 
                            <uri xlink:href="https://www.kaggle.com/c/diabetic-retinopathy-detection/data">EyePACS, Diabetic Retinopathy Detection</uri>.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Class</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Training</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Testing</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">No_DR</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">25810</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">39533</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Mild</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">2443</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">3762</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Moderate</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">5292</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">7861</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Severe</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">873</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1214</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Proliferative_DR</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">708</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1206</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The samples of the training set were rescaled between 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close="]" open="[" separators=",">
                                <mml:mn>0</mml:mn>
                                <mml:mn>1</mml:mn>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula>, cropped to remove their black borders, and augmented by randomly flipping the samples horizontally and vertically, and by randomly rotating the samples by 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mn>360</mml:mn>
                            <mml:mo>&#x00b0;</mml:mo>
                        </mml:math>
                    </inline-formula>. The samples of the test set were only cropped and rescaled. 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows a few augmented samples from the training set.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>A few samples cropped and augmented randomly.</title>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec6">
                <title>Vision transformer</title>
                <p>A vision transformer is a state-of-the-art DL model that is used for image classification and was inspired by Dosovitskiy 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref30">30</xref>
                    </sup> 
                    <xref ref-type="fig" rid="f3">Figure 3</xref> shows the architecture of the vision transformer. In this paper, a retinal image that has a sequence of patches encoded as a set of words was applied to the transformer encoder as shown in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>. The original image&#x2019;s patches
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mspace width="0.25em"/>
                            <mml:mi>N</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mfenced close=")" open="(">
                                <mml:mrow>
                                    <mml:mi>H</mml:mi>
                                    <mml:mo>&#x00d7;</mml:mo>
                                    <mml:mi>W</mml:mi>
                                </mml:mrow>
                            </mml:mfenced>
                            <mml:mo>/</mml:mo>
                            <mml:msup>
                                <mml:mi>P</mml:mi>
                                <mml:mn>2</mml:mn>
                            </mml:msup>
                            <mml:mspace width="0.5em"/>
                        </mml:math>
                    </inline-formula>were extracted with a fixed patch size 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close=")" open="(" separators=",">
                                <mml:mi>P</mml:mi>
                                <mml:mi>P</mml:mi>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula> where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>P</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>16</mml:mn>
                        </mml:math>
                    </inline-formula>, W is the image width, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>H</mml:mi>
                        </mml:math>
                    </inline-formula> is the image height, and 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>N</mml:mi>
                        </mml:math>
                    </inline-formula> is the number of patches. The extracted patches were flattened and each patch 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mi>p</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> belonged to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mspace width="0.25em"/>
                            <mml:msup>
                                <mml:mi mathvariant="normal">&#x211d;</mml:mi>
                                <mml:mrow>
                                    <mml:msup>
                                        <mml:mi>P</mml:mi>
                                        <mml:mn>2</mml:mn>
                                    </mml:msup>
                                    <mml:mo>.</mml:mo>
                                    <mml:mspace width="0.5em"/>
                                    <mml:mi>C</mml:mi>
                                </mml:mrow>
                            </mml:msup>
                        </mml:math>
                    </inline-formula>, where 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>C</mml:mi>
                        </mml:math>
                    </inline-formula> is the number of channels.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>The vision transformer architecture.
                            <sup>
                                <xref ref-type="bibr" rid="ref30">30</xref>
                            </sup>
                        </title>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure3.gif"/>
                </fig>
                <p>As a result, the 2D image was converted into a sequence of patches 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>x</mml:mi>
                            <mml:mo>&#x2208;</mml:mo>
                            <mml:msup>
                                <mml:mi mathvariant="normal">&#x211d;</mml:mi>
                                <mml:mrow>
                                    <mml:mi>N</mml:mi>
                                    <mml:mo>&#x00d7;</mml:mo>
                                    <mml:mfenced close=")" open="(">
                                        <mml:mrow>
                                            <mml:msup>
                                                <mml:mi>P</mml:mi>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                            <mml:mo>.</mml:mo>
                                            <mml:mi>C</mml:mi>
                                        </mml:mrow>
                                    </mml:mfenced>
                                </mml:mrow>
                            </mml:msup>
                        </mml:math>
                    </inline-formula>. Each patch in the sequence 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>x</mml:mi>
                        </mml:math>
                    </inline-formula> was mapped to a latent vector with hidden size 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>D</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>768</mml:mn>
                        </mml:math>
                    </inline-formula>. A learnable class embedding 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msubsup>
                                <mml:mi>z</mml:mi>
                                <mml:mn>0</mml:mn>
                                <mml:mn>0</mml:mn>
                            </mml:msubsup>
                            <mml:mo>=</mml:mo>
                            <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mtext mathvariant="italic">class</mml:mtext>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> was prepended for the embedded patches, whose state at the output of the transformer&#x2019;s encoder 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close=")" open="(">
                                <mml:msubsup>
                                    <mml:mi>z</mml:mi>
                                    <mml:mi>L</mml:mi>
                                    <mml:mn>0</mml:mn>
                                </mml:msubsup>
                            </mml:mfenced>
                            <mml:mspace width="0.25em"/>
                        </mml:math>
                    </inline-formula> serves as the representation 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>y</mml:mi>
                        </mml:math>
                    </inline-formula> of the image. After that, a classifier was attached to the image representation 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>y</mml:mi>
                        </mml:math>
                    </inline-formula>. Additionally, a position embedding 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi>E</mml:mi>
                                <mml:mi mathvariant="italic">pos</mml:mi>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> was added to the patch embeddings to capture the order of patches that were fed into the transformer encoder. 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> illustrates the architecture of transformer&#x2019;s encoder with 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>L</mml:mi>
                        </mml:math>
                    </inline-formula> blocks, each block containing alternating layers of multi-head self-attention (MSA)
                    <sup>
                        <xref ref-type="bibr" rid="ref29">29</xref>
                    </sup> and multi-layer perceptron (MLP) blocks. The layer normalization (LN)
                    <sup>
                        <xref ref-type="bibr" rid="ref31">31</xref>
                    </sup> was applied before every block, while residual connections were applied after every block.
                    <sup>
                        <xref ref-type="bibr" rid="ref30">30</xref>
                    </sup>
                </p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Encoder Architecture of the Transformer.
                            <sup>
                                <xref ref-type="bibr" rid="ref30">30</xref>
                            </sup>
                        </title>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure4.gif"/>
                </fig>
            </sec>
            <sec id="sec7">
                <title>Ensemble learning of vision transformers</title>
                <p>Ensemble learning is a ML ensemble meta-algorithm. Bagging (Bootstrap Aggregating) is a type of ensemble learning that uses &#x201c;majority voting&#x201d; to combine the output of different base models to produce one optimal predictive model and improve the stability and accuracy.
                    <sup>
                        <xref ref-type="bibr" rid="ref32">32</xref>
                    </sup>
                </p>
                <p>The advantage of ensemble bagging several transformers is that aggregation of several transformers, each trained on a subset of the dataset, outperforms a single transformer trained over the entire set. In other words, it leads to less overfit by removing variance in high-variance low-bias datasets. To increase the speed of training, the training can be done in parallel by running each transformer on its own data prior to result aggregation, as shown in 
                    <xref ref-type="fig" rid="f5">Figure 5</xref>.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Ensemble learning of vision transformers.</title>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure5.gif"/>
                </fig>
            </sec>
            <sec id="sec8">
                <title>Experimental setup and protocol</title>
                <p>The images available in this dataset were resized to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>H</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>256</mml:mn>
                            <mml:mo>,</mml:mo>
                            <mml:mi>W</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>256</mml:mn>
                        </mml:math>
                    </inline-formula>, the latent vector hidden size was set to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>D</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>768</mml:mn>
                        </mml:math>
                    </inline-formula>, the number of layers of the transformer to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>L</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>12</mml:mn>
                        </mml:math>
                    </inline-formula>, the 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:msub>
                                <mml:mi mathvariant="italic">MLP</mml:mi>
                                <mml:mtext mathvariant="italic">size</mml:mtext>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mn>3072</mml:mn>
                        </mml:math>
                    </inline-formula>, the 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi mathvariant="italic">MS</mml:mi>
                            <mml:msub>
                                <mml:mi>A</mml:mi>
                                <mml:mtext mathvariant="italic">heads</mml:mtext>
                            </mml:msub>
                        </mml:math>
                    </inline-formula> to 12, and the default value of the patch size to 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>P</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>16</mml:mn>
                        </mml:math>
                    </inline-formula>. Thus, the sequence&#x2019;s number 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mi>N</mml:mi>
                        </mml:math>
                    </inline-formula> was 256.</p>
                <p>In the experiments conducted, 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mn>20</mml:mn>
                            <mml:mo>%</mml:mo>
                        </mml:math>
                    </inline-formula> from each class in the training set were selected for validation. All transformers were fine-tuned using the weights of the transformer pre-trained on ImageNet-21K.
                    <sup>
                        <xref ref-type="bibr" rid="ref33">33</xref>
                    </sup>
                </p>
                <p>For optimization, the ADAM algorithm
                    <sup>
                        <xref ref-type="bibr" rid="ref34">34</xref>
                    </sup> was utilized with a batch size of 8. Furthermore, the mean squared error loss function was used. The training process for each transformer consists of two stages:
                    <list list-type="order">
                        <list-item>
                            <label>1)</label>
                            <p>All layers in the transformer backbone were frozen and the regression head that was initialized randomly was unfrozen. Then, the regression head was trained for five epochs.</p>
                        </list-item>
                        <list-item>
                            <label>2)</label>
                            <p>The entire model (transformer backbone + regression head) which was trained for 40 epochs was unfrozen.</p>
                        </list-item>
                    </list>
                </p>
                <p>Data augmentation, early stopping, dropout, and learning rate schedules were used to prevent overfitting and loss divergence. 
                    <xref ref-type="fig" rid="f6">Figure 6</xref> shows the attention map of a few samples extracted from the transformer.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>The Attention Map of samples A) No DR, B) Mild, C) Moderate, D) Severe, E) Proliferative DR.</title>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure6.gif"/>
                </fig>
                <p>The classification heads of all transformers were removed and replaced by a regression head with one node instead of logits. The regression output of a transformer was interpreted as shown in 
                    <xref ref-type="table" rid="T2">Table 2</xref> to be converted into a category.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>Table 2. </label>
                    <caption>
                        <title>Pseudocode for the transformer regression output interpretation.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Algorithm:</bold> Regression output interpretation
                                    <break/>function classify (
                                    <italic toggle="yes">x
                                        <sub>sample</sub>
                                    </italic>) {
                                    <break/>
                                    <bold>inputs:</bold> 
                                    <italic toggle="yes">x
                                        <sub>sample</sub>
                                    </italic> output float number from the transformer
                                    <break/>
                                    <bold>outputs:</bold> 
                                    <italic toggle="yes">y
                                        <sub>sample</sub>
                                    </italic> which represents the class of the presented sample
                                    <break/>
                                    <bold>If</bold> 
                                    <italic toggle="yes">x
                                        <sub>sample</sub>
                                    </italic> &lt; 0.8 
                                    <bold>then</bold>
                                    <break/>&#x2003;&#x2003;
                                    <bold>return</bold> 
                                    <italic toggle="yes">y
                                        <sub>sample</sub>
                                    </italic> = 
                                    <italic toggle="yes">No DR</italic>
                                    <break/>
                                    <bold>Else If</bold> 
                                    <italic toggle="yes">x
                                        <sub>sample</sub>
                                    </italic> &lt; 1.5 
                                    <bold>then</bold>
                                    <break/>&#x2003;&#x2003;
                                    <bold>return</bold> 
                                    <italic toggle="yes">y
                                        <sub>sample</sub>
                                    </italic> = 
                                    <italic toggle="yes">Mild</italic>
                                    <break/>
                                    <bold>Else If</bold> 
                                    <italic toggle="yes">x
                                        <sub>sample</sub>
                                    </italic> &lt; 2.5 
                                    <bold>then</bold>
                                    <break/>&#x2003;&#x2003;
                                    <bold>return</bold> 
                                    <italic toggle="yes">y
                                        <sub>sample</sub>
                                    </italic> = 
                                    <italic toggle="yes">Moderate</italic>
                                    <break/>
                                    <bold>Else If</bold> 
                                    <italic toggle="yes">x
                                        <sub>sample</sub>
                                    </italic> &lt; 3.5 
                                    <bold>then</bold>
                                    <break/>&#x2003;&#x2003;
                                    <bold>return</bold> 
                                    <italic toggle="yes">y
                                        <sub>sample</sub>
                                    </italic> = 
                                    <italic toggle="yes">Severe</italic>
                                    <break/>
                                    <bold>return</bold> 
                                    <italic toggle="yes">y
                                        <sub>sample</sub>
                                    </italic> = 
                                    <italic toggle="yes">Proliferative DR</italic>
                                    <break/>}</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>An ensemble of ten transformers with similar architectures and hyperparameters was used. The samples were divided randomly into ten sets and each transformer was trained on each one. After interpreting the regression output from each transformer, the predicted classes from ten transformers were aggregated with &#x201c;majority voting&#x201d; to predict the final class.</p>
                <p>Training, validation, and testing were carried out using the TensorFlow framework on an NVIDIA Tesla T4 GPU.</p>
            </sec>
        </sec>
        <sec id="sec9" sec-type="results|discussion">
            <title>Results and discussion</title>
            <sec id="sec10">
                <title>Performance metrics</title>
                <p>In this section, the results of the proposed ensemble of transformers are discussed. The performance metrics, such as precision, recall, and F1 score were calculated. Additionally, the 
                    <uri xlink:href="https://kaggle.com/aroraaman/quadratic-kappa-metric-explained-in-5-simple-steps">quadratic weighted kappa</uri> (QWK) metric was utilized in this dataset because these data needed specialists to label the images manually since the small differences among the classes can only be recognized by specialist physicians. QWK which lies in the range 
                    <inline-formula>
                        <mml:math display="inline">
                            <mml:mfenced close="]" open="[" separators=",">
                                <mml:mrow>
                                    <mml:mo>&#x2212;</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                                <mml:mrow>
                                    <mml:mo>+</mml:mo>
                                    <mml:mn>1</mml:mn>
                                </mml:mrow>
                            </mml:mfenced>
                        </mml:math>
                    </inline-formula> measures the agreement between two ratings and is calculated between the scores assigned by human raters (doctors) and predicted scores (models) as shown in 
                    <xref ref-type="table" rid="T3">Table 3</xref>. The dataset has five ratings: 0, 1, 2, 3, 4.</p>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>Table 3. </label>
                    <caption>
                        <title>OWK interpretation.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Kappa</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Agreement</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">&lt; 0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">No</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.01 &#x2013; 0.20</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Slight</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.21 &#x2013; 0.40</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Fair</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.41 &#x2013; 0.60</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Moderate</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.61 &#x2013; 0.80</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Substantial</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.81 &#x2013; 0.99</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Almost perfect</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>QWK was calculated as follows:
                    <list list-type="order">
                        <list-item>
                            <label>1)</label>
                            <p>The confusion matrix O between predicted and actual ratings was calculated.</p>
                        </list-item>
                        <list-item>
                            <label>2)</label>
                            <p>A histogram vector was computed for each rating in the predictions and in the actual.</p>
                        </list-item>
                        <list-item>
                            <label>3)</label>
                            <p>The 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mi>E</mml:mi>
                                    </mml:math>
                                </inline-formula> 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mfenced close=")" open="(">
                                            <mml:mrow>
                                                <mml:mi>N</mml:mi>
                                                <mml:mo>&#x00d7;</mml:mo>
                                                <mml:mi>N</mml:mi>
                                            </mml:mrow>
                                        </mml:mfenced>
                                    </mml:math>
                                </inline-formula> matrix which represents the outer product between the two histogram vectors was calculated.</p>
                        </list-item>
                        <list-item>
                            <label>4)</label>
                            <p>The 
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mi>W</mml:mi>
                                    </mml:math>
                                </inline-formula> (
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mi>N</mml:mi>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:mi>N</mml:mi>
                                    </mml:math>
                                </inline-formula>) weight matrix was constructed representing the difference between the ratings as shown in 
                                <xref ref-type="table" rid="T4">Table 4</xref>.
                                <sup>
                                    <xref ref-type="bibr" rid="ref35">35</xref>
                                </sup>
                                <disp-formula id="e1">
                                    <mml:math display="block">
                                        <mml:msub>
                                            <mml:mi>W</mml:mi>
                                            <mml:mi mathvariant="italic">ij</mml:mi>
                                        </mml:msub>
                                        <mml:mo>=</mml:mo>
                                        <mml:mfrac>
                                            <mml:msup>
                                                <mml:mfenced close=")" open="(">
                                                    <mml:mrow>
                                                        <mml:mi>i</mml:mi>
                                                        <mml:mo>&#x2212;</mml:mo>
                                                        <mml:mi>j</mml:mi>
                                                    </mml:mrow>
                                                </mml:mfenced>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                            <mml:msup>
                                                <mml:mfenced close=")" open="(">
                                                    <mml:mrow>
                                                        <mml:mi>N</mml:mi>
                                                        <mml:mo>&#x2212;</mml:mo>
                                                        <mml:mn>1</mml:mn>
                                                    </mml:mrow>
                                                </mml:mfenced>
                                                <mml:mn>2</mml:mn>
                                            </mml:msup>
                                        </mml:mfrac>
                                    </mml:math>
                                    <label>(1)</label>
                                </disp-formula>
                            </p>
                            <p>Where 1 &#x2264; 
                                <italic toggle="yes">i</italic> &#x2264; 5, 1 &#x2264; 
                                <italic toggle="yes">j</italic> &#x2264; 5</p>
                        </list-item>
                        <list-item>
                            <label>5)</label>
                            <p>QWK was defined as follows
                                <sup>
                                    <xref ref-type="bibr" rid="ref35">35</xref>
                                </sup>:
                                <disp-formula id="e2">
                                    <mml:math display="block">
                                        <mml:mi>QWK</mml:mi>
                                        <mml:mo>=</mml:mo>
                                        <mml:mn>1</mml:mn>
                                        <mml:mo>&#x2212;</mml:mo>
                                        <mml:mfrac>
                                            <mml:mrow>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mi>i</mml:mi>
                                                    <mml:mi>N</mml:mi>
                                                </mml:msubsup>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mi>j</mml:mi>
                                                    <mml:mi>N</mml:mi>
                                                </mml:msubsup>
                                                <mml:msub>
                                                    <mml:mi>W</mml:mi>
                                                    <mml:mi mathvariant="italic">ij</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x00d7;</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>O</mml:mi>
                                                    <mml:mi mathvariant="italic">ij</mml:mi>
                                                </mml:msub>
                                            </mml:mrow>
                                            <mml:mrow>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mi>i</mml:mi>
                                                    <mml:mi>N</mml:mi>
                                                </mml:msubsup>
                                                <mml:msubsup>
                                                    <mml:mo>&#x2211;</mml:mo>
                                                    <mml:mi>j</mml:mi>
                                                    <mml:mi>N</mml:mi>
                                                </mml:msubsup>
                                                <mml:msub>
                                                    <mml:mi>W</mml:mi>
                                                    <mml:mi mathvariant="italic">ij</mml:mi>
                                                </mml:msub>
                                                <mml:mo>&#x00d7;</mml:mo>
                                                <mml:msub>
                                                    <mml:mi>E</mml:mi>
                                                    <mml:mi mathvariant="italic">ij</mml:mi>
                                                </mml:msub>
                                            </mml:mrow>
                                        </mml:mfrac>
                                    </mml:math>
                                    <label>(2)</label>
                                </disp-formula>
                            </p>
                            <p>where 
                                <italic toggle="yes">N</italic> is the number of classes.</p>
                        </list-item>
                    </list>
                </p>
                <table-wrap id="T4" orientation="portrait" position="float">
                    <label>Table 4. </label>
                    <caption>
                        <title>The Weight Matrix W represents the difference between the classes.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top"/>
                                <th align="left" colspan="1" rowspan="1" valign="top">No_DR</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Mild</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Moderate</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Severe</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Proliferative_DR</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>No_DR</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.25</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.5625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Mild</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.25</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.5625</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Moderate</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.25</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.25</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Severe</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.5625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.25</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Proliferative_DR</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.5625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.25</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0625</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec id="sec11">
                <title>Experimental results</title>
                <p>
                    <xref ref-type="table" rid="T5">Table 5</xref> shows the performance metrics of ten transformers with each one trained on a subset of data. It is obvious that there is a big difference among the performances of these individual transformers. Transformer_1 was able to yield a Kappa of 55.1%. On the other hand, transformer_10 yielded a Kappa of 30.9%. Ensembles of various numbers of transformers including all ten transformers, four transformers (1,3,8,9), and other configurations were also evaluated. The best model was an ensemble of two transformers (1,3) which yielded a Kappa of 60.2%.</p>
                <table-wrap id="T5" orientation="portrait" position="float">
                    <label>Table 5. </label>
                    <caption>
                        <title>Performance metrics for various ensemble models.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Precision %</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Recall %</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">F1 Score %</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">QWK %</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_1</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">43</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>47</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">55.1</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_2</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">37</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">46</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">31</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">40.2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_3</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">39</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">46</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">37</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">52.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_4</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">45</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">41</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">30</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">43.3</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_5</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">41</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">45</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">35</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">48.6</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_6</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">39</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">45</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">33</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">44.9</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_7</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">44</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">42</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">33</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">43.3</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_8</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>47</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">39</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">51.3</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_9</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">46</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">38</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">51.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Transformer_10</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">35</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">36</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">26</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">30.9</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Ensemble of ten transformers</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">45</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>47</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">38</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">53.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Ensemble of four transformers</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">44</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>47</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">41</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">57.5</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Ensemble of two transformers</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>47</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">45</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>42</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>60.2</bold>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>This Kappa is at the boundary between moderate and substantial agreement. The previous results confirm that the performance of the ensemble of transformers (1,3) trained with fewer training images outperformed the ensemble of ten transformers trained with five times the number of images. 
                    <xref ref-type="table" rid="T6">Table 6</xref> compares the performance of the ensemble of transformers with the ensemble of ResNet50 CNNs. The ResNet50 CNN was transferred from ImageNet 1K. The top layers were replaced by a support vector machine that was tuned with this dataset. The proposed ensemble of transformers outperformed the ensemble of ResNet50 CNNs significantly by &gt;18% Kappa.</p>
                <table-wrap id="T6" orientation="portrait" position="float">
                    <label>Table 6. </label>
                    <caption>
                        <title>Comparison between the ensemble of transformers and the ensemble of ResNet50 CNNs.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Model</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Precision %</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Recall %</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">F1 Score %</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">QWK%</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Ensemble of two transformers</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>47</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>45</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>42</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>60.2</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Ensemble of ten ResNet50</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">32</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">44</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">32</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">36.97</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">
                                    <bold>Ensemble of two ResNet50</bold>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">35</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">40</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">35</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">41.52</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The confusion matrix of each configuration including ensembles of transformers with ten, four, and two transformers, and the ensemble of two ResNet50</p>
                <p>CNNs were shown in 
                    <xref ref-type="fig" rid="f7">Figure 7</xref>. The confusion matrix (c) which represents the best Kappa of 60.2% shows that the model was able to recognize the categories of severe and proliferative DR from one side, and NO DR and mild DR from the second side.</p>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>Figure 7. </label>
                    <caption>
                        <title>The Confusion Matrix for the ensemble bagging of A) ten transformers, B) four transformers, C) two transformers, D) two ResNet50 CNN.</title>
                    </caption>
                    <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/76706/c5c3e606-04b1-4f98-964d-42c31c234033_figure7.gif"/>
                </fig>
            </sec>
        </sec>
        <sec id="sec12" sec-type="conclusion">
            <title>Conclusion</title>
            <p>This study is a new attempt to demonstrate the capability of the ensemble bagging of vision transformers applied on retinal image classification for the grading of DR into five levels of severity. The experiments conducted showed that even when the dataset was challenging, the proposed method was able to yield promising performance measures in terms of precision (47%), recall (45%), F1 score (42%), and QWK (60.2%). Furthermore, the inference time was low at 1.12 seconds. Hence, we intend to enhance the performance by utilizing a collection of various DR datasets. This can increase the size and variety of training data to train the proposed model from scratch instead of starting from the weights of the ImageNet 21K-based model. By doing so, we can ultimately enhance performance.</p>
        </sec>
        <sec id="sec13">
            <title>Author contributions</title>
            <p>Conceptualization by N.A., M.A.M.; Data Curation by N.A.; Formal Analysis by N.A., H.A.K., M.A.M.; Funding Acquisition by H.A.K.; Investigation by N.A., J.L.F; Methodology by N.A., H.A.K., M.A.M.; Project Administration by H.A.K.; Software by N.A., M.A.M.; Validation by N.A., M.J.T.T.; Visualization by N.A.; Writing &#x2013; Original Draft Preparation by N.A., M.A.M, J.L.F.; Writing &#x2013; Review &amp; Editing by N.A., H.A.K., M.J.T.T., M.A.M, J.L.F.</p>
        </sec>
        <sec id="sec14">
            <title>Ethics and consent</title>
            <p>All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.</p>
            <p>The Retinal images are public third part dataset provided by 
                <uri xlink:href="https://www.eyepacs.com/">EyePACS</uri>, a free platform for retinopathy screening.</p>
        </sec>
        <sec id="sec15">
            <title>Competing interests</title>
            <p>None of the authors declare any competing interests.</p>
        </sec>
        <sec id="sec16">
            <title>Grant information</title>
            <p>This research project was funded by Multimedia University, Malaysia.</p>
        </sec>
        <sec id="sec17">
            <title>Data availability</title>
            <p>The 
                <uri xlink:href="https://www.kaggle.com/c/diabetic-retinopathy-detection/data">dataset</uri> used in this work is accessible to the public on the Kaggle website. It was created in 2015 for the Kaggle Diabetic Retinopathy Detection competition. This competition is sponsored by the 
                <uri xlink:href="https://www.chcf.org/">California Healthcare Foundation</uri>. Retinal images were provided by 
                <uri xlink:href="https://www.eyepacs.com/">EyePACS</uri>, a free platform for retinopathy screening.</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <collab>American Diabetes Association</collab>:
                    <article-title>Diagnosis and classification of diabetes mellitus.</article-title>
                    <source>

                        <italic toggle="yes">Diabetes Care.</italic>
</source>
                    <year>Jan. 2010</year>;<volume>33</volume>(<issue>Suppl 1</issue>):<fpage>S62</fpage>&#x2013;<lpage>S69</lpage>.
                    <pub-id pub-id-type="pmid">24357215</pub-id>
                    <pub-id pub-id-type="doi">10.2337/dc14-S081</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fowler</surname>
                            <given-names>MJ</given-names>
                        </name>
</person-group>:
                    <article-title>Microvascular and Macrovascular Complications of Diabetes.</article-title>
                    <source>

                        <italic toggle="yes">Clin Diab.</italic>
</source>
                    <year>Apr. 2008</year>;<volume>26</volume>(<issue>2</issue>):<fpage>77</fpage>.
                    <pub-id pub-id-type="pmid">27366724</pub-id>
                    <pub-id pub-id-type="doi">10.4103/2230-8210.183480</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4911847</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants.</article-title>
                    <source>

                        <italic toggle="yes">Lancet.</italic>
</source>
                    <year>Apr. 2016</year>;<volume>387</volume>(<issue>10027</issue>):<fpage>1513</fpage>&#x2013;<lpage>1530</lpage>.
                    <pub-id pub-id-type="pmid">27061677</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0140-6736(16)00618-8</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5081106</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="other">
                    <collab>International Diabetes Federation</collab>:
                    <source>

                        <italic toggle="yes">IDF Diabetes Atlas.</italic>
</source>
                    <edition>9th</edition>ed.
                    <publisher-loc>Brussels, Belgium</publisher-loc>:
                    <publisher-name>International Diabetes Federation</publisher-name>;<year>2019</year>, pp.<fpage>32</fpage>&#x2013;<lpage>61</lpage>.</mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Narayan</surname>
                            <given-names>KMV</given-names>
                        </name>
</person-group>:
                    <article-title>The Diabetes Pandemic: Looking for the Silver Lining.</article-title>
                    <source>

                        <italic toggle="yes">Clinical Diabetes.</italic>
</source>
                    <year>Apr. 2005</year>;<volume>23</volume>(<issue>2</issue>):<fpage>51</fpage>&#x2013;<lpage>52</lpage>.
                    <pub-id pub-id-type="doi">10.2337/diaclin.23.2.51</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cheloni</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gandolfi</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Signorelli</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global prevalence of diabetic retinopathy: protocol for a systematic review and meta-analysis.</article-title>
                    <source>

                        <italic toggle="yes">BMJ Open.</italic>
</source>
                    <year>Mar. 2019</year>;<volume>9</volume>(<issue>3</issue>):<fpage>e022188</fpage>.
                    <pub-id pub-id-type="pmid">30833309</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmjopen-2018-022188</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6443069</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fernandez-Loaiza</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sauma</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Classification of diabetic retinopathy and diabetic macular edema.</article-title>
                    <source>

                        <italic toggle="yes">World J Diabetes.</italic>
</source>
                    <year>Dec. 2013</year>;<volume>4</volume>(<issue>6</issue>):<fpage>290</fpage>&#x2013;<lpage>294</lpage>.
                    <pub-id pub-id-type="pmid">24379919</pub-id>
                    <pub-id pub-id-type="doi">10.4239/wjd.v4.i6.290</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3874488</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lo</surname>
                            <given-names>ACY</given-names>
                        </name>
</person-group>:
                    <article-title>Diabetic Retinopathy: Pathophysiology and Treatments.</article-title>
                    <source>

                        <italic toggle="yes">Int J Mol Sci.</italic>
</source>
                    <year>Jun. 2018</year>;<volume>19</volume>(<issue>6</issue>):<fpage>1816</fpage>.
                    <pub-id pub-id-type="pmid">29925789</pub-id>
                    <pub-id pub-id-type="doi">10.3390/ijms19061816</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6032159</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Song</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Aiello</surname>
                            <given-names>LP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pasquale</surname>
                            <given-names>LR</given-names>
                        </name>
</person-group>:
                    <article-title>Presence and Risk Factors for Glaucoma in Patients with Diabetes.</article-title>
                    <source>

                        <italic toggle="yes">Curr Diab Rep.</italic>
</source>
                    <year>Dec. 2016</year>;<volume>16</volume>(<issue>12</issue>):<fpage>124</fpage>&#x2013;<lpage>124</lpage>.
                    <pub-id pub-id-type="pmid">27766584</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s11892-016-0815-6</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5310929</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pollreisz</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schmidt-Erfurth</surname>
                            <given-names>U</given-names>
                        </name>
</person-group>:
                    <article-title>Diabetic cataract-pathogenesis, epidemiology and treatment.</article-title>
                    <source>

                        <italic toggle="yes">J Ophthalmol.</italic>
</source>
                    <year>2010</year>;<volume>2010</volume>:<fpage>608751</fpage>&#x2013;<lpage>608751</lpage>.
                    <pub-id pub-id-type="pmid">20634936</pub-id>
                    <pub-id pub-id-type="doi">10.1155/2010/608751</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2903955</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Das</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McGuire</surname>
                            <given-names>PG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rangasamy</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Diabetic Macular Edema: Pathophysiology and Novel Therapeutic Targets.</article-title>
                    <source>

                        <italic toggle="yes">Ophthalmology.</italic>
</source>
                    <year>2015</year>;<volume>122</volume>(<issue>7</issue>):<fpage>1375</fpage>&#x2013;<lpage>1394</lpage>.
                    <pub-id pub-id-type="pmid">25935789</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ophtha.2015.03.024</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cheung</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mitchell</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>TY</given-names>
                        </name>
</person-group>:
                    <article-title>Diabetic retinopathy.</article-title>
                    <source>

                        <italic toggle="yes">Lancet.</italic>
</source>
                    <year>Jul. 2010</year>;<volume>376</volume>(<issue>9735</issue>):<fpage>124</fpage>&#x2013;<lpage>136</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S0140-6736(09)62124-3</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rees</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Association Between Diabetes-Related Eye Complications and Symptoms of Anxiety and Depression.</article-title>
                    <source>

                        <italic toggle="yes">JAMA Ophthalmol.</italic>
</source>
                    <year>Sep. 2016</year>;<volume>134</volume>(<issue>9</issue>):<fpage>1007</fpage>&#x2013;<lpage>1014</lpage>.
                    <pub-id pub-id-type="pmid">27387297</pub-id>
                    <pub-id pub-id-type="doi">10.1001/jamaophthalmol.2016.2213</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kramer</surname>
                            <given-names>CK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rodrigues</surname>
                            <given-names>TC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Canani</surname>
                            <given-names>LH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Diabetic Retinopathy Predicts All-Cause Mortality and Cardiovascular Events in Both Type 1 and 2 Diabetes.</article-title>
                    <source>

                        <italic toggle="yes">Diabetes Care.</italic>
</source>
                    <year>May 2011</year>;<volume>34</volume>(<issue>5</issue>):<fpage>1238</fpage>.
                    <pub-id pub-id-type="pmid">21525504</pub-id>
                    <pub-id pub-id-type="doi">10.2337/dc11-0079</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3114518</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yau</surname>
                            <given-names>JWY</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Global Prevalence and Major Risk Factors of Diabetic Retinopathy.</article-title>
                    <source>

                        <italic toggle="yes">Diabetes Care.</italic>
</source>
                    <year>Mar. 2012</year>;<volume>35</volume>(<issue>3</issue>):<fpage>556</fpage>&#x2013;<lpage>564</lpage>.
                    <pub-id pub-id-type="pmid">22301125</pub-id>
                    <pub-id pub-id-type="doi">10.2337/dc11-1909</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3322721</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rohan</surname>
                            <given-names>TE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Frost</surname>
                            <given-names>CD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wald</surname>
                            <given-names>NJ</given-names>
                        </name>
</person-group>:
                    <article-title>Prevention of blindness by screening for diabetic retinopathy: a quantitative assessment.</article-title>
                    <source>

                        <italic toggle="yes">BMJ.</italic>
</source>
                    <year>Nov. 1989</year>;<volume>299</volume>(<issue>6709</issue>):<fpage>1198</fpage>&#x2013;<lpage>1201</lpage>.
                    <pub-id pub-id-type="pmid">2514883</pub-id>
                    <pub-id pub-id-type="doi">10.1136/bmj.299.6714.1528-b</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1838350</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hutchinson</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Effectiveness of screening and monitoring tests for diabetic retinopathy--a systematic review.</article-title>
                    <source>

                        <italic toggle="yes">Diabet Med.</italic>
</source>
                    <year>Jul. 2000</year>;<volume>17</volume>(<issue>7</issue>):<fpage>495</fpage>&#x2013;<lpage>506</lpage>.
                    <pub-id pub-id-type="pmid">10972578</pub-id>
                    <pub-id pub-id-type="doi">10.1046/j.1464-5491.2000.00250.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bhatia</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Arora</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tomar</surname>
                            <given-names>R</given-names>
                        </name>
</person-group>:
                    <article-title>Diagnosis of diabetic retinopathy using machine learning classification algorithm.</article-title>
                    <source>

                        <italic toggle="yes">2016 2nd Int Conf Next Generation Computing Technologies (NGCT).</italic>
</source>
                    <year>2016</year>, pp.<fpage>347</fpage>&#x2013;<lpage>351</lpage>.
                    <pub-id pub-id-type="doi">10.1109/NGCT.2016.7877439</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Asha</surname>
                            <given-names>PR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Karpagavalli</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Diabetic Retinal Exudates Detection Using Extreme Learning Machine.</article-title>
                    <source>

                        <italic toggle="yes">Emerging ICT for Bridging the Future - Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2, Cham.</italic>
</source>
                    <year>2015</year>; pp.<fpage>573</fpage>&#x2013;<lpage>578</lpage>.
                    <pub-id pub-id-type="doi">10.1109/ICACCS.2015.7324057</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sopharak</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Uyyanonvara</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Automatic exudates detection from diabetic retinopathy retinal image using fuzzy c-means and morphological methods.</article-title>
                    <source>

                        <italic toggle="yes">Proceedings of the 3
                            <sup>rd</sup> IASTED International Conference of Advances in Computer Science and Technology.</italic>
</source>
                    <year>2007</year>; pp.<fpage>359</fpage>&#x2013;<lpage>364</lpage>.
                    <pub-id pub-id-type="pmid">22574005</pub-id>
                    <pub-id pub-id-type="doi">10.3390/s90302148</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3332251</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Osareh</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mirmehdi</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thomas</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Automatic recognition of exudative maculopathy using fuzzy C-means clustering and neural networks.</article-title>
                    <source>

                        <italic toggle="yes">Proc Medical Image Understanding Analysis Conference.</italic>
</source>
                    <year>2001</year>, vol.<volume>3</volume>, pp.<fpage>49</fpage>&#x2013;<lpage>52</lpage>.</mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pratt</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Coenen</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Broadbent</surname>
                            <given-names>DM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Convolutional Neural Networks for Diabetic Retinopathy.</article-title>
                    <source>

                        <italic toggle="yes">Procedia Computer Sci.</italic>
</source>
                    <year>Jan. 2016</year>;<volume>90</volume>:<fpage>200</fpage>&#x2013;<lpage>205</lpage>.
                    <pub-id pub-id-type="doi">10.1016/j.procs.2016.07.014</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gulshan</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.</article-title>
                    <source>

                        <italic toggle="yes">JAMA.</italic>
</source>
                    <year>Dec. 2016</year>;<volume>316</volume>(<issue>22</issue>):<fpage>2402</fpage>&#x2013;<lpage>2410</lpage>.
                    <pub-id pub-id-type="pmid">27898976</pub-id>
                    <pub-id pub-id-type="doi">10.1001/jama.2016.17216</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Prenta&#x0161;i&#x0107;</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lon&#x010d;ari&#x0107;</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Detection of exudates in fundus photographs using deep neural networks and anatomical landmark detection fusion.</article-title>
                    <source>

                        <italic toggle="yes">Comput Methods Programs Biomed.</italic>
</source>
                    <year>Dec. 2016</year>;<volume>137</volume>:<fpage>281</fpage>&#x2013;<lpage>292</lpage>.
                    <pub-id pub-id-type="pmid">28110732</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cmpb.2016.09.018</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>van Grinsven</surname>
                            <given-names>MJJP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>van Ginneken</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hoyng</surname>
                            <given-names>CB</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Fast Convolutional Neural Network Training Using Selective Data Sampling: Application to Hemorrhage Detection in Color Fundus Images.</article-title>
                    <source>

                        <italic toggle="yes">IEEE Transactions Medical Imaging.</italic>
</source>
                    <year>May 2016</year>;<volume>35</volume>(<issue>5</issue>):<fpage>1273</fpage>&#x2013;<lpage>1284</lpage>.
                    <pub-id pub-id-type="doi">10.1109/TMI.2016.2526689</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>26</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>
</person-group>:
                    <article-title>A Deep Learning Based Pipeline for Image Grading of Diabetic Retinopathy.</article-title>
                    <source>

                        <italic toggle="yes">Master of Science: Virginia Polytechnic Institute and State University.</italic>
</source>
                    <year>2018</year>.</mixed-citation>
            </ref>
            <ref id="ref27">
                <label>27</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sallam</surname>
                            <given-names>MS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Asnawi</surname>
                            <given-names>AL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Olanrewaju</surname>
                            <given-names>RF</given-names>
                        </name>
</person-group>:
                    <article-title>Diabetic Retinopathy Grading Using ResNet Convolutional Neural Network.</article-title>
                    <source>

                        <italic toggle="yes">2020 IEEE Conference on Big Data and Analytics (ICBDA).</italic>
</source>
                    <year>2020</year>, pp.<fpage>73</fpage>&#x2013;<lpage>78</lpage>.
                    <pub-id pub-id-type="doi">10.1109/ICBDA50157.2020.9289822</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>28</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Smaida</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yaroshchak</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Bagging of convolutional neural networks for diagnostic of eye diseases.</article-title>
                    <source>

                        <italic toggle="yes">CEUR Workshop Proceedings.</italic>
</source>
                    <year>2020</year>;<volume>2604</volume>,<fpage>715</fpage>&#x2013;<lpage>729</lpage>.</mixed-citation>
            </ref>
            <ref id="ref29">
                <label>29</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vaswani</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Attention Is All You Need.</article-title>
                    <source>

                        <italic toggle="yes">arXiv:1706.03762 [cs].</italic>
</source>
                    <year>Dec. 2017</year>.
 Accessed: May 30, 2021.
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1706.03762">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>30</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dosovitskiy</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lucas</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Alexander</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>An image is worth 16x16 words: Transformers for image recognition at scale.</article-title>
                    <source>

                        <italic toggle="yes">ICLR.</italic>
</source>
                    <year>2021</year>.</mixed-citation>
            </ref>
            <ref id="ref31">
                <label>31</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ba</surname>
                            <given-names>JL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kiros</surname>
                            <given-names>JR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hinton</surname>
                            <given-names>GE</given-names>
                        </name>
</person-group>:
                    <article-title>Layer Normalization.</article-title>
                    <source>

                        <italic toggle="yes">arXiv:1607.06450 [cs, stat].</italic>
</source>
                    <year>Jul. 2016</year>.
 Accessed: Jun. 30, 2021.
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1607.06450">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>32</label>
                <mixed-citation publication-type="book">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fan</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <chapter-title>Bagging</chapter-title>In:
                    <source>

                        <italic toggle="yes">Encyclopedia of Database Systems.</italic>
</source>
                    <person-group person-group-type="editor">

                        <name name-style="western">
                            <surname>LIU</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>&#x00d6;ZSU</surname>
                            <given-names>MT</given-names>
                        </name>
</person-group>, Eds.
                    <publisher-loc>Boston, MA</publisher-loc>:
                    <publisher-name>Springer US</publisher-name>;<year>2009</year>, pp.<fpage>206</fpage>&#x2013;<lpage>210</lpage>.
                    <pub-id pub-id-type="doi">10.1007/978-0-387-39940-9_567</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>33</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ridnik</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ben-Baruch</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Noy</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ImageNet-21K Pretraining for the Masses.</article-title>
                    <source>

                        <italic toggle="yes">arXiv:2104.10972 [cs].</italic>
</source>
                    <year>Jun. 2021</year>.
 Accessed: Jun. 30, 2021.
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2104.10972">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>34</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kingma</surname>
                            <given-names>DP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ba</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Adam: A Method for Stochastic Optimization.</article-title>
                    <source>

                        <italic toggle="yes">arXiv:1412.6980 [cs].</italic>
</source>
                    <year>Jan. 2017</year>.
 Accessed: Jun. 30, 2021.
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1412.6980">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>35</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tymchenko</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marchenko</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Spodarets</surname>
                            <given-names>D</given-names>
                        </name>
</person-group>:
                    <article-title>Deep Learning Approach to Diabetic Retinopathy Detection.</article-title>
                    <source>

                        <italic toggle="yes">arXiv:2003.02261 [cs, stat].</italic>
</source>
                    <year>Mar. 2020</year>.
Accessed: Jun. 30, 2021.
                    <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/2003.02261">Reference Source</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report94946">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.76706.r94946</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Jain</surname>
                        <given-names>Shruti</given-names>
                    </name>
                    <xref ref-type="aff" rid="r94946a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7538-0584</uri>
                </contrib>
                <aff id="r94946a1">
                    <label>1</label>Department of Electronics and Communication Engineering, Jaypee University of Information Technology, Solan, Himachal Pradesh, India</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>11</month>
                <year>2021</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2021 Jain S</copyright-statement>
                <copyright-year>2021</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport94946" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.73082.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <list list-type="order">
                    <list-item>
                        <p>Add a section highlighting the main contributions of your methodology, with detailed reference&#x00a0;to, and comparison with, existing work.</p>
                    </list-item>
                    <list-item>
                        <p>I find it difficult to understand, from the abstract, the proposed methodology by which you seek to solve the problem of your paper. For example,&#x00a0;the authors could clarify the following:&#x00a0;&#x201c;To enhance DR grading, this paper proposes a novel solution based on an ensemble of state-of-the-art deep learning models called vision transformers&#x201d;. What are vision transformers? Are authors proposing this or what is new in it?&#x00a0;
                            <ext-link ext-link-type="uri" xlink:href="https://protect-us.mimecast.com/s/vMJeCXD7MEU40vlymu6zWdj?domain=viso.ai/">https://viso.ai/deep-learning/vision-transformer-vit/</ext-link>
                        </p>
                        <p> </p>
                        <p> Likewise the authors write "A challenging public DR dataset proposed in a 2015 Kaggle challenge was used for training and evaluation of the proposed method." Yet there are many datasets for DR grading, so why only Kaggle? The author can validate&#x00a0;their&#x00a0;model with other datasets.</p>
                    </list-item>
                    <list-item>
                        <p>Add some latest papers and cite them.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Read the whole manuscript for typos and grammatical mistakes.</p>
                    </list-item>
                    <list-item>
                        <p>Recheck, not all references that could be given are given. In 'Dataset Overview', in the sentence &#x201c;the quadratic weighted kappa (QWK) metric was utilized in this dataset because of these data...&#x201d; the reference is missing. Likewise there are places where authors can give references.</p>
                    </list-item>
                    <list-item>
                        <p>Finally, your conclusion needs to be more tailored to your findings. The authors write "Hence, we intend to enhance the performance by utilizing a collection of various DR datasets. This can increase the size and variety of training data to train the proposed model from scratch instead of starting from the weights of the ImageNet 21K-based model. By doing so, we can ultimately enhance performance&#x201d;</p>
                        <p> </p>
                        <p> Yet, why don&#x2019;t authors have tried increasing the datasets utilized already? Furthermore, the ImageNet 21K-based model is mentioned significantly in the conclusion without having been elaborated upon in the main text of the article: why was it introduced, what is it, and what is it for and how does it improve performance? Likewise, there is no mention of ensemble transformers or vision transformers in the conclusion.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>No</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Image and Signal Processing, Soft Computing, Internet-of-Things, Pattern Recognition, Bio-inspired Computing and Computer-Aided Design of FPGA and VLSI circuits</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
</article>
