Artificial intelligence based glaucoma and diabetic retinopathy detection using MATLAB — retrained AlexNet convolutional neural network

Background Glaucoma and diabetic retinopathy (DR) are the leading causes of irreversible retinal damage leading to blindness. Early detection of these diseases through regular screening is especially important to prevent progression. Retinal fundus imaging serves as the principal method for diagnosing glaucoma and DR. Consequently, automated detection of eye diseases represents a significant application of retinal image analysis. Compared with classical diagnostic techniques, image classification by convolutional neural networks (CNN) exhibits potential for effective eye disease detection. Methods This paper proposes the use of MATLAB – retrained AlexNet CNN for computerized eye diseases identification, particularly glaucoma and diabetic retinopathy, by employing retinal fundus images. The acquisition of the database was carried out through free access databases and access upon request. A transfer learning technique was employed to retrain the AlexNet CNN for non-disease (Non_D), glaucoma (Sus_G) and diabetic retinopathy (Sus_R) classification. Moreover, model benchmarking was conducted using ResNet50 and GoogLeNet architectures. A Grad-CAM analysis is also incorporated for each eye condition examined. Results Metrics for validation accuracy, false positives, false negatives, precision, and recall were reported. Validation accuracies for the NetTransfer (I-V) and netAlexNet ranged from 89.7% to 94.3%, demonstrating varied effectiveness in identifying Non_D, Sus_G, and Sus_R categories, with netAlexNet achieving a 93.2% accuracy in the benchmarking of models against netResNet50 at 93.8% and netGoogLeNet at 90.4%. Conclusions This study demonstrates the efficacy of using a MATLAB-retrained AlexNet CNN for detecting glaucoma and diabetic retinopathy. It emphasizes the need for automated early detection tools, proposing CNNs as accessible solutions without replacing existing technologies.


Introduction
2][3][4] Glaucoma is a condition caused by elevated intraocular pressure. 1The most common are open-angle glaucoma, angle-closure glaucoma, normal-tension glaucoma, and congenital glaucoma. 2On the other hand, DR is the most frequent complication of diabetes mellitus. 3It occurs because the small blood vessels in the retina swell and bleed or leak fluid, causing retinal damage and vision problems. 3,4DR has five stages or classes: normal, mild, moderate, severe and proliferative DR. 4 Ophthalmic examination is essential for the diagnosis of glaucoma and DR.The following tests are carried out by physicians in order to perform a diagnosis for glaucoma: measuring intraocular pressure (tonometry), 5 analyzing optic nerve damage with a dilated eye exam, checking areas of vision loss (visual field test), 6 measuring corneal thickness (pachymetry) 7 and inspecting the angle of drainage (gonioscopy). 8As most of these are imaging tests of the eye, it is essential to have accurate high quality images in order to perform a correct diagnosis of the disease.
Similarly, DR is usually detected by physicians through comprehensive ophthalmologic examinations requiring pupil dilation.This to facilitate detailed cross-sectional imaging that show the thickness of the retina where fluid may be leaking from damaged blood vessels (optical coherence tomography) 9 and injecting a special dye that place blood vessels with blockages plus blood vessels leaking blood (fluorescein angiography). 10Diagnosing these conditions necessitates the expertise of specialized medical professionals, resulting in significant time and financial costs.Furthermore, discrete diagnostic approaches are essential for each disease.Given the potential coexistence of diabetes with both conditions, a diagnosis of diabetes does not preclude the possibility of glaucoma. 11This knowledge from medical professionals in identifying glaucoma and diabetic retinopathy is beneficial, enabling the creation of accurately labelled large databases.Such groundwork allows for the analysis of data based on established truths, facilitating the development of classification models by non-medical experts.
3][14][15] Consequently, employing AI for the automated analysis of fundus images can assist physicians by facilitating accessible, reliable, and affordable detection of glaucoma and other related visual pathologies (Table 1).
Convolutional neuronal networks (CNNs) are a class of deep learning method, most commonly applied to analyse visual imagery.[18][19][20][21][22][23] Among the diverse CNNs, AlexNet, by Krizhevsky et al. achieved a new state-of-the-art recognition accuracy against all conventional machine learning and computer vision approaches that offer the opportunity to be retrained. 24AlexNet has sustained its significance as a neural network, attributed to the simplicity of its architecture, enabling its operation without necessitating substantial computational resources.As a result, AlexNet is structured with eight main layers, incorporating five convolutional layers-max pooling after the first, second and fifth convolutional layer-and three fully connected REVISED Amendments from Version 1 This revised version of the manuscript incorporates improvements suggested by reviewers during the peer review process.Notably, it introduces a section on "Operational Resources" to detail the software and hardware employed throughout this research.A more thorough description regarding data partitioning and training parameter configuration has also been provided.
Furthermore, a significant addition includes a benchmarking of models to evaluate the performance of AlexNet, ResNet50, and GoogLeNet, accompanied by training/validation graphs and model training duration times.A Grad-CAM analysis was conducted for each trained model and detection condition.
The revision of the previous version was undertaken to correct grammatical errors and improve the overall presentation.It is important to note that previously cited works or reported results have not been altered or replaced.The aim of the presented new version is to enrich the content through updated citations, figures, and sections.Finally, it is worth mentioning that the generated algorithms and scripts are available to readers via a Zenodo repository.
Any further responses from the reviewers can be found at the end of the article   layers.Activation via the Rectified Linear Unit (ReLU) function is applied after each layer, with the exception of the final layer, which employs a softmax layer to function as the classification mechanism of the trained network. 24,25ansfer learning involves utilizing a pre-trained network as a base model to learn a new task.This approach, notably through fine-tuning, proves being more efficient and simpler than training a network from the ground up with randomly initialized weights.As a result, the pre-trained CNN quickly transfer learned features using a smaller number of training images.In this paper, a transfer learning method to retrain the MatLab -AlexNet CNN is applied for an effective glaucoma and DR detection, aiming to make the aided recognition procedure through a low-complexity CNN accessible.
Additionally, the final trained model is benchmarked against the ResNet50 26 and GoogLeNet 27 architectures to evaluate comparative performance (Figure 1).Furthermore, a Grad-CAM analysis across these architectures is conducted to elucidate the focal points of observation within different models.

Methods
To carry out the detection of glaucoma and DR through CNN, image pre-processing and processing techniques are required.The different steps are summarized in Figure 2.

Image acquisition
For the training of the CNN it is necessary to use retinal fundus images of the eye.Several public databases that compile different eye conditions are available on the internet.In this sense, it is possible to find free access databases and databases with access upon request.The following were used in this work: -Free access -databases ○ Asia Pacific Tele-Ophthalmology Society (APTOS).Contains 3662 images of diabetic retinopathy that were used in the APTOPS 2019 blindness screening competitions.Each image has been resized and cropped to have a maximum size of 1024px.A certified clinician rated each image according to the severity of diabetic retinopathy on a scale of 0 to 4. A directory file is provided according to the previous scale: No diabetic retinopathy (0), Mild (1), Moderate (2), Severe (3), and Proliferative diabetic retinopathy (4).○ Sungjoon Choi High-Resolution Fundus (sjchoi86-HRF).Created by Sungjoon Choi, assistant professor at Korea University, contains 601 fundus images of different pixel sizes divided into 4 groups: normal (300 images), glaucoma (101 images), cataract (100 images) and retina disease (100 images). 30ccess upon requestdatabases ○ Large-scale attention based glaucoma (LAG).Contains fundus images with positive (1711 images) and negative glaucoma (3143 images) samples obtained from Beijing Tongren Hospital with a resolution of 500Â500px.Each fundus image is diagnosed by qualified glaucoma specialists, taking into consideration of both morphologic and functional analysis.In the case of the ODIR database, photographs labeled in their directory file as "glaucoma" (G) and "normal fundus" (N) were extracted for a total of 200 images and 2873 images, respectively.On the other hand, for the APTOS database, photographs labeled in their directory as "moderate" (2), "severe" (3) and "proliferative diabetic retinopathy" (4) were extracted for a total of 1487 images in general.

• 16 GB RAM DDR4 Memory
Image pre-processing AlexNet architecture is specifically designed for processing color (RGB) images with a resolution of 227Â227 pixels.Image pre-processing from databases is conducted using the custom function Convertidor_227_final.m (refer to software availability), which includes a user interface for cropping black areas and resizing images of any dimension to the required 227Â227 pixel format.
The function of cropping black areas in the photograph by Convertidor_227_final.m is applied to each database.This is done to have more information on the retinal area and eliminate areas of no interest.This function binarizes the original image to obtain a black and white image of equal dimensions.Since the area where a color pixel existed now has a value of 1 and the black areas have a value of 0, the pixel location index by row and column where the value is equal to 1 is extracted as a list.Using the value of the pixel location index as image coordinates, the maximum and minimum value per row and column is determined to establish the cropping edges of the image.It should be mentioned that due to its code design, this function does not affect previously cropped images that no longer contain black areas.
Following the removal of black borders, the Convertidor_227_final.m function is employed to resize the photographs.Subsequently, all images within the database are standardized to a uniform dimension of 227Â227 pixels.According to their original medical classification, the obtained retinal fundus images were labeled as non-disease (Non_D), suspicious glaucoma (Sus_G) and suspicious diabetic retinopathy (Sus_R).For the purposes of CNN re-training, five distinct storage folders were organized (Table 2).

Image processing
To develop the predictive software for eye disease detection, transfer learning is utilized to retrain the CNN AlexNet.The pre-trained AlexNet network is loaded alongside the different databases (LAG, APTOS, HFR, ODIR, and sjchoi86-HRF) that containing the images of the different pathologies to be classified, specifically glaucoma and retinopathy.Information from Refs.33-40 is employed to develop our algorithm.
To initiate dataset training, image storage folders as outlined in Table 2 are created.Images are stored in a primary folder with corresponding subfolders Non_D, Sus_G, and Sus_R, based on the original classification assigned within their respective databases.The primary database is loaded as an "imds" variable, and the data contained within the subfolders are segmented into training and validation sets.A conventional data division approach is applied, allocating 70% of the images for training and 30% for validation using the "splitEachLabel (imds,0.7,'randomized')"function. 40This method randomly splits the data in the image datastore "imds" into two new datastores.
MATLAB allocates 70% of the images from each label (or subfolder) in "imds" for training and the remaining 30% for validation, with the selection done in a randomized manner.This ensures that the training and validation datasets are representative of the overall dataset, enhancing the generalizability of the model trained on this data.A representation of the data split generated by MATLAB is presented for the maximum data volume of 9,680 observations see Figure 3. Alexnet was applied to binary classification, distinguishing retinal fundus images as Non_D vs. Sus_G (NetTransfer I & II), and multi-class classification, differentiating among Sus_G, Sus_R and Non_D categories (NetTransfer III, IV & V).Each image storage folder, as detailed in Table 2, underwent training with the corresponding NetTransfer model number.
The subsequent pseudocode outlines the procedure for both binary and multi-class classification tasks in AlexNet.

Training algorithm for transfer learning
Input ->Retinal fundus images (X, Y); Y = {y {Non-disease, Suspicious-Glaucoma, Suspicious-Diabetic-Retinopathy} Output-> Re-trained model that classifies the retinal fundus images into respective Y Import the pre-trained model AlexNet Network with its corresponding weights.
Replace the last three layers of the Network: -Fully connected layer (Set the 'WeightLearnRateFactor' to 20 and the 'BiasLearnRateFactor' to 20; and set its output to the number of elements of Y).

Training-progress settings
MinibatchSize->It is the number of elements into the group of inputs for each iteration MaxEpoch->It is the maximum number of times that the network is going to use all the input elements InitialLearnRate ->The learning rate is a tuning parameter that determines the step size at each iteration while moving toward a minimum of a loss function.

Shuffle->It is the action of mixing randomly various elements from our databases
ValidationData ->It is a group of images from the dataset that the network is using to Validate how good the network is getting at classification ValidationFrequency ->It is the number of iterations that the system does before doing a validation process to assess in Furthermore, a "ValidationFrequency" of 3 was chosen due to the relatively low epoch count of the model (six in this instance), and an "InitialLearnRate" of 0.0001 was selected as a conservative value to facilitate gradual adjustments to the weights of the model.The following figure resumes the architecture of all the new networks designed during the transfer learning technique (Figure 4).

Benchmarking of models
The performance comparison between the architectures of AlexNet, GoogLeNet, and ResNet50 was conducted through multi-class classification of the categories Sus_G, Sus_R, and Non_D.Given that data storage 5 represents the most complete dataset, it was chosen as the input data storage for the training of three new models (netALEXNET, netRESNET50, and netGOOGLENET).To accelerate the model design of these networks, the "deepNetworkDesigner"   function was utilized, application aimed for network architecture and transfer learning techniques through a user-friendly interface (refer to software availability).
In this context, the construction and training of the models were carried out under identical conditions for data loading and handling as those used in NetTransfer I-V, with feedback provided only for the parameters MBS, EPOCHS, and ValidationFrequency.For ResNet50, the model loading exceeded our computational capabilities, necessitating a reduction in MBS to align it with the IPE (see Table 4).
Consequently, the decision was made to extend the epoch count to 30 for all three models, given the necessity to examine the behavior of AlexNet, ResNet50, and GoogLeNet across a broader range of iterations.This adjustment aligns with a more coherent approach to studying CNNs for classification purposes.Additionally, the validation frequency was set to match the IPE, thereby conducting a validation at the end of each epoch.This strategy aims to be more conservative in computational cost, consequently reducing training time.
Upon completion of the training phase, algorithms were developed to generate Grad-CAM diagrams for each architecture by loading the trained models netALEXNET, netRESNET50, and netGOOGLENET (refer to software availability).The Grad-CAM analysis for multi-class classification of the categories Sus_G, Sus_R, and Non_D was conducted for each model.

Results
The outcomes achieved through the application of transfer learning technique culminated in the development of five retrained AlexNet networks, hereafter referred to as NetTransfer networks.The confusion matrices for these NetTransfer networks are depicted in Figure 5, encompassing precision, recall, false positive (FP), false negative (FN), and accuracy values (highlighted in a yellow box).Furthermore, the matrices are structured such that the rows represent known values, while the columns indicate predicted values.
NetTransfer I network was only based on glaucoma and non-disease image cases existing in the LAG-database (Table 2), training with these datasets lead to values of validation accuracy of 94.3%.Besides that, Non_D detection also presented values of 95.5% for recall (4.5% for FN), and values of 95.6% for the precision of the system (4.4% for FP).
NetTransfer II network was based on glaucoma and non-disease images cases existing in the LAG-database and the sjchoi86-HRF database (Table 2), training with these datasets lead to values of validation accuracy of 91.8%.Besides that, Non_D detection presented values of 95.9% for recall (4.1% for FN), and values of 91.8% for the precision of the system (8.2% for FP).
NetTransfer III network was based on glaucoma, diabetic retinopathy and non-disease images cases existing in the LAGdatabase, sjchoi86-HRF database and the HRF database (Table 2), training with these datasets lead to values of validation accuracy of 89.7%.Besides that, Non_D detection presented values of 97.0% for recall (3.0% for FN), and values of 88.9% for the precision of the system (11.1% for FP).
NetTarnsfer IV network was based on glaucoma, diabetic retinopathy and non-disease images cases existing in the LAGdatabase, sjchoi86-HRF database, HRF database and the APTOS database (Table 2), training with these datasets lead to values of validation accuracy of 93.1%.Besides that, Non_D detection presented values of 93.2% for recall (6.8% for FN), and values of 93.5% for the precision of the system (6.5% for FP).
NetTransfer V network was based on glaucoma, diabetic retinopathy and non-disease images cases existing in the LAGdatabase, sjchoi86-HRF database, HRF database, APTOS database and ODIR database (Table 2), training with these datasets lead to values of validation accuracy of 92.1%.Besides that, Non_D detection presented values of 96.8% for recall (3.2% for FN), and values of 92.0% for the precision of the system (8.0%for FP).
Similarly, the study includes an analysis of the transfer learning performance of the AlexNet network in comparison with other significant architectures, specifically ResNet50 and GoogLeNet.The models to which transfer learning was applied have been designated as netAlexNet, netResNet, and netGoogLeNet (see Figure 6).
The netAlexNet network was based on glaucoma, diabetic retinopathy, and non-diseased images from the LAG-database, sjchoi86-HRF database, HRF database, APTOS database, and ODIR database (Table 4).Training with these observations returned a validation accuracy of 93.24%.Furthermore, detection of Non_D cases achieved a recall rate of 97.4% (2.6% for FN) and a precision of 92.5% (7.5% for FP).ResNet50 achieved a maximum validation accuracy of 93.8%, with recall and precision for Non_D detection at 96.8% and 93.8%, respectively.GoogleNet architecture attained a maximum validation accuracy of 90.4%, with its Non_D detection showing recall and precision rates of 94.8% and 90.9%, respectively.Additionally, training graphs for netAlexNet, netResNet50 network model are provided for enhanced understanding in a live-script (refer to software availability).These graphs detail the performance evolution    Training duration for the netAlexNet network was observed at 15 minutes and 40 seconds, netResNet50 at 622 minutes and 17 seconds, and netGoogLeNet at 264 minutes and 32 seconds.Parameters such as EPOCHS, IPE, Validation Frequency, and Learning Rate are described for each model.Employing the netAlexNet model, in conjunction with the netResNet50 and netGoogLeNet neural networks, nine Grad-CAM heatmaps were generated to illuminate subtle differences in feature prioritization across the networks.This Grad-CAM involved a multi-class register across evaluated eye conditions, including Non_D, Sus_G, and Sus_R (see Figure 8).

Discussion
Several works were presented for glaucoma detection using fundus photographs by calculating cup-disk-ratio (CDR).For example, Carrillo and coworkers 41 developed an autonomic detection method and a novel method for cup segmentation with a percentage of success of 88.5%.Another work from Anum Abdul and peers, 42 an algorithm was provided to detect CDR and hybrid textural and intensity features.Those features were used to classify the autonomous system, and it gave improvements in the results from previous studies that only used CDR, thanks to their hybrid approach, they reached an accuracy of 92%.Although the CDR characteristic was not utilized, the AlexNet methodology demonstrates comparable accuracy levels with NetTransfer V (92.1%) and netAlexNet (93.2%), matching the performance of previously cited methods without requiring CDR calculation.
In other more rigorous studies such as Xiangyu Chen work, 43 a deep CNN was developed with a total of six layers: two fully connected layers and four convolutional layers.The results drop scores of prediction from 71% to 83% from real images.On the other hand, Hanruo Liu and peers 23 made a deep learning system using a total of 241,032 images from 68,013 patients.In this work, every image was subjected to a multiple layers of grading system, in which graders were from students to senior specialists on glaucoma, from these they obtained good levels of sensitivity and specificity (82.20% and 70.40%).Compared to other CNN systems, the AlexNet-based detection systems (NetTransfer V & netAlexNet) exhibited results that are comparable and, in some cases, arguably superior in terms of accuracy, sensitivity, and specificity for detecting glaucoma and diabetic retinopathy.Another related work from Almeida and peers, 44 uses image processing in MATLAB to improve the accuracy of glaucoma tests by extracting the most pertinent qualities of the images obtaining promising results with an accuracy, specificity, and sensitivity greater than 90%, which indicates that it gives an excellent start for us to assess the glaucoma diagnosis through AI.Although the system developed by Almeida and colleagues appears more specialized for glaucoma detection, NetTransfer V and netAlexNet offers the added benefit of simultaneously detecting multiple pathologies, including DR, which was another condition integrated into the detection framework.
Another study demonstrating superior results in glaucoma detection is that of Shinde, 45 where the application of two distinct architectures enabled perfect prediction (100%) of their images.However, as noted earlier, the versatility of the AlexNet network provides a comparative advantage by facilitating the differentiation and classification of multiple diseases.In contrast, a monolithic classification system may identify the absence of one disease (glaucoma), but overlook the presence of other pathologies, such as DR.
NetTransfer V & netAlexNet are also able to detect DR, it is also pertinent to compare it to other deep learning systems that were developed for DR detection.In the study realized by Rishab Gargeya and Theodore Leng, 18 they developed and evaluated a data-driven deep learning algorithm as a novel diagnostic tool for automated DR detection, which proved to reach high efficacy computer-aided model, with low-cost, which lead to correct DR diagnostics without depending on clinicians to examine and grade images manually.
A different study made by Shanthi and Sabeenian, 19 used a modified AlexNet CNN system for the detection of DR in a big data training of the network.Additionally, Amnia Salma and peers 17 develop a similar system, but they used GoogLeNet instead of AlexNet.While all of these systems follow similar principles to the NetTransfer V & netAlexNet propose systems, it is important to remark that the pre-trained networks acquired higher accuracies, sensitivities and specificities than the previously mentioned systems, mostly due to using a higher number datasets.The decision was made to extend the application of AlexNet for simultaneous classification of multiple diseases.The subsequent table provides a summary and comparison of the detection capabilities of the AlexNet pathology detection systems (NetTransfer V & netAlexNet) with those of all previously mentioned research, in addition to other significant studies not previously discussed (Table 5).
In the benchmarking of AlexNet against other recognized architectures, specifically ResNet50 and GoogLeNet, the confusion matrices affirm that the netResNet50 model exhibits superior overall performance compared to the three models (refer to Figure 5).This superiority is partly attributed to flawless performance in the Sus_R category and enhanced performance in the Non_D category.The netResNet50 model is closely followed by its counterpart netAlexNet and, to a lesser extent, by GoogLeNet.However, the presentation of the training process through performance/validation graphs indicates a shorter training duration for netAlexNet in comparison to netResNet50 and netGoogLeNet (Figure 6).This is largely attributed to the fewer number of layers in AlexNet, as opposed to GoogLeNet and ResNet50, the latter of which even necessitated a reduction in the MBS at the expense of increasing the IPE.Furthermore, within the performance graphs record, netResNet50 exhibited signs of overfitting, as evidenced by superior training performance over validation in consecutive epochs.Meanwhile, netGoogLeNet demonstrated performance similar to netAlexNet in the performance/validation process, which, despite a lower accuracy, did not show signs of overfitting.
The second row of images offers a detailed visualization of Grad-CAM outputs for the three networks applied to glaucomatous retinal images (see Figure 7).Initial examination of these images highlights a shared emphasis on the optic disk (OD) and optic nerve by all networks.However, a deeper analysis reveals significant differences in the degree and specificity of feature focus.ResNet50, recognized for its extensive reach, displays an extensive region of interest that goes beyond the OD boundaries, covering adjacent retinal areas.While this wide-ranging observation might seem beneficial, it adds complexity by including features outside the OD, which could complicate the classification task.On the other hand, GoogLeNet, known for its broad and encompassing analysis, indicates the widest area of interest, covering the entire OD and optic cup (OC), as well as the surrounding periphery.This extensive observation potentially facilitates a thorough evaluation of areas prone to pathological changes.However, its challenge lies in the indiscriminate encompassment of adjacent areas, possibly including anomalies not related to the disease under investigation, thus affecting diagnostic precision.Conversely, Grad-CAM outputs from the netAlexNet model, despite focusing on a more confined area, concentrate on crucial aspects such as the OD-OC ratio.
In the examination of DR images depicted in the third row of images (see Figure 8), a noticeable shift from the OD-focused analysis seen in glaucoma heatmaps is apparent.The GoogLeNet heatmap highlights a significant focus in the upper regions of the eye, likely due to increased vascularity.However, this focus might overlook potential manifestations of the disease in the lower regions and the OD, leading to possible misrecognition, especially in cases where DR pathology is primarily present in these areas.In contrast, the heatmap from ResNet50, while covering a wider area, demonstrates a less precise focus, capturing various ocular regions.This broad coverage aims to identify a wide range of retinal blood vessels and neurons but may result in a compromise between breadth and specificity, potentially affecting discernment capabilities.
Examination of eyes without disease reveals a focus on regions around ocular blood vessels and the OD in all three Grad-CAMs.The networks of AlexNet and ResNet50 extensively outline these areas.In contrast, GoogLeNet markedly neglects the OD, crucial for diagnosing both glaucoma and diabetic retinopathy, potentially leading to false negatives by missing clinically significant features, thus impacting recognition precision.This variation highlights the necessity for thorough feature assessment, particularly when minor anomalies are diagnostically critical.
Additionally, for the implementation of the AlexNet architecture on open-source language, the use of TensorFlow is endorse as a free open-source self-learning platform based on the Python language, mainly developed by Google. 47mong its many available, Keras is identified as deep learning application programming interface (API) developed for Python and built on TensorFlow, where a user can build the proposed AlexNet equivalent model.The recommended model is the sequential model of Keras which allows a user to define the model as a series of convolutional layers with max pooling. 48

Conclusions
In the presented research, the training of a CNN through the use of MATLAB software and its AlexNet tool, allowed the effective recognition of two eye diseases (glaucoma and DR) through retinal fundus images.Additionally, the use of open access databases allows the replicability and reproducibility of the present study.Being the APTOS, HRF and sjchoi86-HRF databases of immediate access.Meanwhile, LAG and ODIR are databases with access upon request.The implementation of the different databases (LAG, APTOS, HRF, ODIR, sjchoi86-HRF), proved to be effective in improving the prediction percentages of the different neural network trainings.
In general, the most common eye affections are presented through a series of symptoms, such as blurred vision, spots, glare, eye fatigue, dry eyes, among others.In this way, glaucoma proves to be a condition that damages the optic nerve and generally does not present any symptoms, until the person suffering from it perceives a decrease in vision in the final stages of the disease.Based on the foregoing, it is necessary to create tools that allow an effective detection of this type of affectation, for example CNN systems as an alternative, highly reliable in the automation of processes.Similarly, the study does not replace state-of-the-art technologies in the recognition of retinal pathology, nor to compete with identification systems that represent a new paradigm in the recording and analysis of retinal fundus images.Instead, offer an initial approach to enthusiasts interested in accessible recognition techniques through CNN models.Additionally, the research expanded its detection objective by incorporating a benchmark of models, complemented by a Grad-CAM analysis, through multi-class classification on the categories Sus_G, Sus_R and Non_D.
Future improvements to this algorithm could include the creation of a more user-friendly graphical interface for users who are not experts in programming language.In this way, the detection tasks will be based on the selection of options and not on the coding of algorithms.On the other hand, as previously mentioned, it is possible to replicate the AlexNet-CNN using Python, by using existing tools such as TensorFlow and Keras API.Therefore, a subsequent study will concentrate efforts on implementing the recognition system in the open-source language, to endorse the use of non-proprietary software in order to increase reproducibility.AlexNet has remained a significant neural network over time, owing to the simplicity of its architecture, which allows operation without the need for extensive computational resources.The observation provided led to the realization that benchmarking models was essential to better understand the performance of the study relative to other significant architectures.Consequently, comparisons were not only made with the ResNet50 architecture but also included GoogLeNet, thereby enriching the study with a broader variety of architectural analyses.

Matlab provides a different deep learning toolbox for each version. Write in detail which version and which toolbox the authors used.
This observation was fully addressed by adding a new section titled "Operational Resources."This section not only describes the software used, including the version of MATLAB employed for the study, but also details the hardware utilized.Appreciation is expressed for highlighting this theoretical component, which was an oversight not previously considered in the study.

Provide Grad-CAM heatmaps.
Given the incorporation of model benchmarking in the updated version, the opportunity was taken to conduct a Grad-CAM analysis using the three trained models: netAlexNet, netResNet50, and netGoogLeNet.The Grad-CAM analysis was performed for each model across the three identification conditions considered: non-disease, suspicious glaucoma, and suspicious diabetic retinopathy.Appreciation is extended for the request of this addition, as it enhances the comprehensiveness of the work through the interpretative insight provided by the Grad-CAM heatmaps for each model and evaluated condition.The development of predictive systems in the field of ophthalmology has been on the rise for several years with the use of Convolutional Neural Networks (CNNs) and other types of systems, many of which emulate the structures of existing networks for prediction purposes.Within this context, the presented work does not signify a paradigm shift in the state of the art, as it aligns with concepts previously implemented.Nonetheless, the study demonstrates significant utility and importance, highlighting that transfer learning via AlexNet yields highly promising results, potentially surpassing some detection methods, networks built from scratch, or studies that employed pre-established network structures.This approach advocates for the adoption of transfer learning and a streamlined methodology that does not require complex data pre-processing, thereby simplifying classification tasks and aligning with existing predictive systems.
6.Many important papers in this fields are omitted in the manuscript.Please discuss several significant previous works and important products of fundus photography reading devices for diabetic retinopathy.
In response to this observation, a select number of contemporary and pertinent studies within the field, related to predictive systems for these pathologies, have been incorporated.It is noted that the aim of this work is not to compete with or replace established systems, nor is it intended to be perceived as comparable to specialized retinal fundus image technologies.Rather, it is approached as an initial step towards the development of a system capable of ensuring efficient and accessible identification tasks, with considerations for future enhancements.
Competing Interests: No competing interests were disclosed.

Creed Jones
Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA The article presents a CNN implementation using MATLAB to determine the presence of glaucoma using labeled retinal images.The result and its conclusions are very interesting, and the open and freely verifiable nature of the work is great.
Detailed Comments -The abstract should clarify the "two classes" and "three classes" referred to. 1.
The authors refer to a "red color filter" that is added to grayscale images, which is not necessary if RGB images are available.This is very confusing and needs both justification and explanation.

2.
The use of 6 epochs needs to be justified.

3.
The presentation of the confusion matrices is very clear and good.4.
It's not clear whether external knowledge of diabetes diagnoses was used (or could be used) in distinguishing DR from glaucoma.

5.
Clearly calling out the available databases is very helpful for replicating/extending the results.

6.
Grammar throughout is good but needs improvement.Look at use of "The" at the start of sentences.Also, is there a misspelling in the title of the confusion matrices; should "PREDICTEC" be "PREDICTED"?7.

Are sufficient details of methods and analysis provided to allow replication by others? Partly
If applicable, is the statistical analysis and its interpretation appropriate?

The presentation of the confusion matrices is very clear and good.
The design of the matrices, as well as all figures within the study, are original creations.Furthermore, enhancements have been implemented to ensure the images are more comprehensible and visually clear.
5. It's not clear whether external knowledge of diabetes diagnoses was used (or could be used) in distinguishing DR from glaucoma.External knowledge from medical experts on glaucoma and diabetic retinopathy is leveraged for generating extensive, accurately labelled databases.This foundation enables the data to undergo analysis, providing a basis of truth for the development of classification models by non-medical professionals.Moreover, while physicians rely on physiological criteria and specific medical evaluations for diagnosing retinal diseases using images, Convolutional Neural Networks (CNNs) identify patterns within a set of images that may or may not correlate with characteristic features of retinal diseases.Consequently, the application of Grad-CAM becomes essential to ascertain the focal points of a trained model during the discriminative process of recognition.This observation is appreciated, as it has facilitated the enhancement of previously provided information in the article and the correction of potential inaccuracies.

Clearly calling out the available databases is very helpful for replicating/extending the results.
The rationale for providing a detailed description and citation of the databases utilized was to ensure access to information for individuals interested in conducting related studies on retinal fundus images.Appreciation is extended for acknowledging the thoroughness of the descriptions provided.
7. Grammar throughout is good but needs improvement.Look at use of "The" at the start of sentences.Also, is there a misspelling in the title of the confusion matrices; should "PREDICTEC" be "PREDICTED"?Consequently, a decision was made to enhance the overall clarity of the text, including the revision of sections to ensure the narrative is presented in the third person.It is anticipated that these modifications will contribute to a more assertive tone throughout the document.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 2 .
Figure 2. Proposed system for glaucoma and diabetic retinopathy detection using AlexNet.

Figure 4 .
Figure 4. Proposed neural network architecture for eye diseases detection based on AlexNet.
during training and validation phases, alongside the training configurations and the duration of model training (see to Figure7).

4 .
Please describe the partitioning process for the training and test sets in detail.The image processing section has been enhanced to address previous ambiguities and provide a more comprehensive understanding.Furthermore, a figure illustrating the data partitioning volume for segregating images into validation and training datasets has been added.A mathematical framework for model training parameters, contingent on the number of observations in the training and validation sets, has been established.This framework includes detailed descriptions of Mini Batch Size, Iterations Per Epoch, Training Set (TS), and Validation Set (VS), thereby rendering the information of the study to be more relevant and concise through the requested corrections.5.Considering recent advances of deep learning in ophthalmology domains, the contribution of this study is too limited.Please review recent advances and show the strengths of this study.

Reviewer Report 06
February 2023 https://doi.org/10.5256/f1000research.134260.r161036© 2023 Jones C.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table
. Different deep learning systems for optical pathology detection.

Table 2 .
Quantity of pre-processed images used from each database for the storage folders.
24al time how the training is going Verbose->Verbose mode is an option that provides additional details as to what the computer is doing and what drivers and software it is loading during startupThe MiniBatchSize (MBS) parameter specifies the number of observations processed and used to update the weights of the model in each iteration.By setting the MBS to 10, the observations in the Training Set (TS) are divided by this number to calculate the Iterations Per Epoch (IPE).This division ensures each observation is utilized once per epoch, reducing biases in the training process.Thus, every iteration involves processing a mini-batch of data, executing a forward pass through the network, calculating the error, and adjusting the weights.Models employed in transfer learning technique have previously undergone training on extensive and generalized datasets, such as ImageNet for CNNs,24emphasizing fine-tuning over learning from scratch.Consequently, "MaxEpochs" is set as 6 for model evaluation, given transfer learning technique typically requires "tuning" the weights of the pre-trained model to suit a new specific task rather than acquiring all features once again.This fine-tuning process demands fewer modifications to the weights, achievable within a limited number of epochs.Table3provides a comprehensive overview of the underlying mathematics for each model, detailing calculations related to database size and the corresponding derived training parameters.

Table 3 .
Mathematical framework for model training parameters based on database size.

Table 5 .
Comparison between related studies.

the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate? Partly Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Partly Competing Interests:
MATLAB codes and scripts related to image processing, pre-processing & Training versions of the AlexNet Convolutional Neural Network (NetTransfers I-V).No competing interests were disclosed.