ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

DERMOSAN: Development of an interactive software tool in Streamlit for dermatological diagnosis assisted by convolutional neural networks (CNNs)

[version 1; peer review: awaiting peer review]
PUBLISHED 31 Dec 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Abstract*

Background

Artificial intelligence (AI)-assisted dermatological diagnosis has become increasingly important due to its ability to support clinical decision-making and reduce diagnostic variability among specialists. This paper presents DERMOSAN, an interactive software tool developed in Streamlit, based on a convolutional neural network (CNN), designed for the automatic classification of dermatological diseases using clinical images.

Methods

A public Kaggle dataset with 27,153 dermatological images distributed across ten clinical categories was used. The model was built on the ResNet152 architecture with transfer learning, implemented in TensorFlow and Keras. The images were divided into training (80%), validation (10%), and test (10%) subsets, applying class weighting to mitigate data imbalance. The tool integrates a Streamlit-based graphical interface that enables real-time image upload, local processing, and probabilistic diagnosis visualization.

Results

The model achieved 95% accuracy in training, 83% in validation, and 92.3% in testing. The confusion matrix showed better performance in the most representative classes. The developed interface allows for automatic diagnosis along with confidence scores and complementary clinical suggestions, facilitating quick and reproducible visual interpretation.

Conclusions

DERMOSAN is a stage of software development aimed at integrating deep learning models into accessible clinical environments. Although the system does not include clinical validation, it represents a step toward the creation of reproducible, open-source tools for AI-assisted diagnosis.

Keywords

dermatological diagnosis, artificial intelligence, convolutional neural networks, deep learning, ResNet152, Streamlit, software development, medical image classification

Introduction

Over the past decade, artificial intelligence (AI) has significantly transformed the field of healthcare due to its ability to analyze large volumes of clinical data and assist medical professionals in decision-making.1 In dermatology, this transformation has been particularly relevant, given that the diagnosis of skin diseases depends largely on visual observation and the experience of the specialist, which can lead to diagnostic variability and delays in timely treatment.2 In this context, Convolutional Neural Networks (CNNs) have established themselves as one of the most promising tools for the automated classification of medical images.3

At the same time, artificial intelligence has also been successfully applied in other domains through the development of expert systems for assisted decision-making. For example, Huayta-Gómez and Pacheco4 implemented an expert system for vocational guidance, structured around a six-model approach—organizational, task and agent, knowledge, communication, design, and implementation—that optimized the process of diagnosis and self-knowledge among students. This type of methodological approach demonstrates the versatility of artificial intelligence in the design of interactive and reproducible solutions, principles that also guide the development of the present study in the context of assisted dermatological diagnosis.

Several studies have explored the application of deep learning models to support dermatological diagnosis. For example, Kadhim and Kamil5 developed a machine learning-based system for the classification of dermatological images, optimizing the clinical analysis process. Similarly, Lesaunier et al.6 and Zhang et al.7 showed that the incorporation of information technology systems in areas such as interventional radiology improves efficiency and reduces errors through the application of Lean principles.8 These advances demonstrate the potential of AI to transform medical diagnosis, although they also highlight limitations related to the accessibility and reproducibility of the tools developed.

CNNs, in particular, reproduce the human visual learning process through convolutional layers that allow complex patterns in images to be identified.9 Their application has spread to specialties such as radiology, digital pathology, and dermatology, achieving results comparable to and even superior to those of dermatologists in specific classification tasks.10 In Asia,11 demonstrated the effectiveness of deep architectures—including CNN, RNN, GAN, and LSTM—in improving real-time diagnostic accuracy. As shown in Figure 1, these types of networks have transformed the diagnostic process by complementing clinical evaluation with deep learning–based decision support systems. However, most of these developments are concentrated in environments with high technological availability and large databases, which limits their adoption in developing countries.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure1.gif

Figure 1. Traditional dermatological diagnosis vs. CNN-assisted diagnosis.

Source: Own elaboration.

Note: Comparison between traditional dermatological diagnosis and diagnosis assisted by convolutional neural networks (CNN).

In Latin America, the adoption of AI-based tools has advanced gradually. Tentori et al.12 reported the development of telemedicine and computer-assisted diagnosis initiatives in Brazil and Mexico, showing promising results despite persistent challenges related to technological infrastructure and access to standardized clinical data. In Peru, studies by Ponce et al.13 and Sarria et al.14 highlighted advances in medical automation, yet revealed the lack of local solutions focused on dermatological image analysis using deep learning models. This gap emphasizes the need for accessible and reproducible tools designed for academic research and clinical integration of artificial intelligence.

In response to this need, DERMOSAN, an interactive software tool in Streamlit for CNN-assisted dermatological diagnosis, was developed. Its purpose is to offer an intuitive development phase platform that allows dermatological images to be uploaded and probabilistic diagnoses to be obtained in real time. Unlike previous solutions, DERMOSAN integrates an advanced CNN architecture (ResNet152) with an interactive open-source interface, facilitating its use for academic, research, and demonstration purposes, promoting reproducibility and support for medical specialists.

Methods

In this section, we describe the study design, the origin of the data, the technical implementation of the model, and the operation of the assisted dermatological diagnosis tool.

Study design

This work corresponds to an applied study with a quantitative and methodological-observational approach, aimed at developing and validating an automated diagnostic tool.

The dataset was stratified by class into three subsets: training (80%), validation (10%), and testing (10%). To ensure reproducibility, a fixed random seed (SEED = 42) was used in all operations involving randomness (such as random image sampling and internal shuffling). This allows the partitioning and random behaviors to be consistent when running the same code with the same data and environment.

During training, the validation set was used to adjust hyperparameters and control overfitting; the independent test set was used exclusively for the final evaluation of the model’s performance and generalization.

Data source and class distribution

The public dataset was obtained from Kaggle and comprises 27,153 dermatological images distributed across ten clinical categories. Table 1 summarizes the number of images per class and subset (training, validation, and test), revealing an imbalance among some categories. As illustrated in Figure 2, this imbalance becomes more evident in the bar chart, where the Melanocytic Nevi (NV) class accounts for the largest proportion of images, highlighting the non-uniform distribution among classes.

Table 1. Distribution of images by class and subset (train/validation/test).

ClassTrainValidationTest Total
Eczema13411671691677
Melanoma25123143143140
Atopic Dermatitis10051251271257
Basal Cell Carcinoma (BCC)26583323333323
Melanocytic Nevi (NV)63767977977970
Benign Keratosis-like Lesions (BKL)16632072092079
Psoriasis/Lichen Planus & related diseases16442052062055
Seborrheic Keratoses & other benign tumors14771841861847
Tinea/Candidiasis & other fungal infections13611701711702
Warts/Molluscum & other viral infections16822102112103
Total21719 2711 2723 27153
54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure2.gif

Figure 2. Bar chart of class distribution.

Source: Own elaboration.

Note: The Melanocytic Nevi (NV) class accounts for the largest proportion of images, indicating an imbalance.

Due to the imbalance between classes, the class_weight parameter was used during training to adjust the relative importance of each class in the error calculation. As a result, classes with fewer examples received a greater penalty when making errors, which encourages the model to pay more attention to learning those classes and not skew toward the dominant classes.

Implementation

Technologies and library integration

The diagnostic tool was developed in Python 3.10, using the TensorFlow 2.15.0 and Keras 3.0 frameworks to implement the ResNet152 model using the transfer learning technique.

In addition, several complementary libraries were incorporated to strengthen the overall functionality of the system. Scikit-learn was used to calculate performance metrics and generate the confusion matrix, while Plotly, Matplotlib, and Seaborn were used for interactive visualization of loss curves, accuracy, and clinical graphs.

NumPy and Pandas enabled the manipulation and structured analysis of data, ensuring efficient statistical processing, while OpenCV and Pillow were used for the preprocessing and validation of dermatological images.

Finally, Streamlit 1.32.0 was implemented as the development environment for the graphical interface, facilitating direct interaction between the user and the predictive model, as well as immediate and reproducible visualization of diagnostic results.

Methodology in development phases

To structure the DERMOSAN development process, a phased methodological approach was adopted, inspired by the model proposed by Ramos-Miller and Pacheco,15 who implemented a five-phase methodology—comprising analysis, planning, implementation, review, and deployment—in the development of an educational web-based inventory control system. This sequential approach enhances the traceability of requirements and enables a systematic evaluation of the software throughout its development cycle.

  • Phase 1: Data preprocessing and preparation

    Images (JPEG, PNG, WEBP) are read, decoded, resized to 224×224 px, and normalized using preprocess_input from the ResNet model. Class mapping and organization into directories by class are generated. A global seed (SEED = 42) was set to ensure reproducibility in sampling and shuffling. In addition, the resulting dataset is shuffled internally and prefetched for data reading optimization.

  • Phase 2: Stratified division and calculation of class weights

    The images are partitioned into training (80%), validation (10%), and testing (10%) in a stratified manner by class. Class_weight is then calculated using the ‘balanced’ method, so that less frequent classes receive a higher weight during training, mitigating biases.

  • Phase 3: Model design and configuration

    ResNet152 is adopted with pre-trained weights from ImageNet. All layers except the last 50 are frozen for fine-tuning. Dense layers with ReLU activations, dropout = 0.35, and a final softmax layer are added. The model is compiled using the Adam optimizer and sparse categorical cross-entropy loss. ReduceLROnPlateau (factor = 0.2, patience = 2, min_lr = 1e-6) and ModelCheckpoint (saves weights at the end of each epoch) callbacks are configured.

  • Phase 4: Training and validation

    Training is run for a total of 9 epochs (or resumes from checkpoint if one exists). In each epoch, val_loss is evaluated to adjust the learning rate and save the best performing model. The optimal model is exported as best_resnet152.h5.

  • Phase 5: Integration with the interface and local deployment

    The final trained model is incorporated into an application developed with Streamlit. The application allows the user to upload images, validate them (e.g., format, resolution), process them through inference, and view the diagnosis, confidence level, and estimated risk. The system is internally modularized into components (image validation, inference, visualization) to facilitate maintenance and extensibility.

Graphical interface and processing flow

The application, developed with Streamlit, features an interactive interface that allows dermatological images in JPG, PNG, or WEBP format to be uploaded for automated analysis. Once the image is uploaded, the system runs an internal validation process that evaluates aspects such as sharpness, lighting, and resolution, ensuring that the images are suitable for assisted diagnosis.

The validated image is then processed by the pre-trained ResNet152 model, which generates an automatic classification with the main diagnosis, the confidence level, and an estimate of the associated clinical risk. The results are presented in a visual panel that combines quantitative and graphical information, facilitating the interpretation of predictions by the user.

Figure 3 schematically illustrates the tool’s processing flow, from image loading to final diagnosis generation.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure3.gif

Figure 3. General flow of the diagnostic tool processing.

Source: Own elaboration.

Note: The diagram illustrates the initial stages of the system, from image loading by the user to the generation of the automated diagnosis.

This application is configured as an interactive and reproducible clinical support tool that contributes to the automated analysis of dermatological images without replacing the diagnostic work of the specialist.

Operation

Minimum system requirements

The application was designed to run locally on personal computers, without the need for an internet connection or cloud infrastructure. The minimum requirements for its operation are as follows:

  • Operating system: Windows 10 or Ubuntu 20.04 or higher.

  • Processor (CPU): Intel Core i5 or equivalent.

  • RAM: Minimum 8 GB.

  • Graphics processing unit (GPU): Optional, although an NVIDIA T4 (16 GB VRAM) is recommended to accelerate model inference.

  • Software dependencies: Python ≥ 3.10, TensorFlow, Keras, Streamlit, NumPy, Pandas, and scikit-learn.

To run the application, simply open a terminal in the project directory and execute the command: streamlit run app.py

Once the environment is initialized, the interface can be viewed from a modern web browser with JavaScript enabled, by accessing the local address indicated by Streamlit (by default, http://localhost:8501).

Evaluation metrics

The main metric considered was overall accuracy, calculated on the test set by comparing actual labels with model predictions. In addition, the confusion matrix was analyzed, which allowed us to identify error patterns between categories, as well as the loss and accuracy curves recorded during the training and validation phases.

Ethical considerations

The study used only a public, previously anonymized dataset of dermatological images (Kaggle), in compliance with the repository’s terms of use.

As it did not contain any identifiable patient information, approval by an institutional ethics committee was not required. The system was developed for academic purposes and is a diagnostic support tool, not a substitute for the medical judgment of a specialist.

Results

The model developed achieved an accuracy of 95% in training, 83% in validation, and 92.3% in testing, reflecting the technical performance achieved in the development stage of the DERMOSAN software. These values indicate solid model performance, with slight differences between training and validation attributable to class imbalance.

Figure 4 shows the confusion matrix obtained in the test set. Superior performance is observed in classes with a greater number of images, such as Melanocytic Nevi (NV) and Basal Cell Carcinoma (BCC), while minority categories (Eczema, Atopic Dermatitis) show more classification errors.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure4.gif

Figure 4. Confusion matrix of the ResNet152 model on the test set.

Source: Own elaboration.

Note: The model performs better in classes with a larger number of images, while minority categories show more classification errors.

Figure 5 shows the evolution of loss during training and validation. A progressive reduction is observed in the training set, while in validation the values remain relatively high, indicating the presence of overfitting.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure5.gif

Figure 5. Evolution of loss during training and validation.

Source: Own elaboration.

Note: Loss decreases steadily during training, while remaining high during validation, indicating overfitting.

Figure 6 shows the accuracy curve. The model achieved over 95% accuracy in training, while in validation it stabilized at around 83%. This difference is related to the difficulty of adequately generalizing in classes with lower representativeness.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure6.gif

Figure 6. Evolution of accuracy in training and validation.

Source: Own elaboration.

Note: The model achieved >95% in training and ~83% in validation. The gap reflects generalization limitations derived from class imbalance.

Figure 7 shows the graphical interface developed in Streamlit, which allows individual images to be uploaded and the model’s suggested diagnosis to be obtained in real time. The application displays the main diagnosis with its confidence level, as well as the most likely differential diagnoses.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure7.gif

Figure 7. Screenshot of the interactive graphical interface.

Source: Own elaboration.

Note: The tool allows you to upload an image, display the most likely diagnosis with its confidence level, and present differential diagnoses.

Finally, the overall performance of the model on the test set is shown in Figure 8, where a total accuracy of 92.3% was achieved when evaluating the 2,723 test images. This result confirms the model’s ability to generalize on unseen data, maintaining a consistent relationship with the training and validation metrics.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure8.gif

Figure 8. Overall performance of the model on the test set.

Source: Own elaboration.

Note: The model achieved an accuracy of 92.3% on the test set (2723 images), demonstrating good generalization ability.

Although additional metrics such as precision, recall, or F1-score were not calculated, their inclusion is recommended in future work to evaluate the balance of the model, especially in minority classes. During training, it was observed that the differences in performance between training and validation remained consistent from the early epochs, suggesting that the model learned quickly and that class imbalance limits its generalization ability in less represented classes.

Use cases

To evaluate the practical functioning of the developed system, controlled tests were performed in the local version of the Streamlit environment. These tests simulated real clinical use scenarios, with the aim of verifying the model’s ability to process dermatological images, generate diagnoses, and issue interpretive recommendations.

Case 1. Dermatological image upload and quality validation

In Case 1, the user accesses the graphical interface and selects a dermatological image in JPG, PNG, or WEBP format. The platform displays a preview and performs an automatic quality check, evaluating parameters such as sharpness, lighting, and minimum resolution requirements. If the image does not meet the established criteria, the system notifies the user to replace it before analysis. Once validated, the interface confirms that the image is suitable for diagnosis ( Figure 9).

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure9.gif

Figure 9. Image loading in the system interface and automatic image quality verification.

Source: Own elaboration.

Case 2. Evaluation and classification by the trained model

In Case 2, the validated image is sent to the pre-trained ResNet152 model with transfer learning, which processes the sample locally using the weights stored in best_resnet152.h5. Based on this inference, the system classifies the skin lesion into one of ten available clinical categories and displays the diagnosis with its confidence level.

For example, one image was classified as Benign Keratosis-like Lesions (BKL) with a confidence of 98.1%, while other categories such as Basal Cell Carcinoma (BCC) had lower probabilities. The result is displayed in a bar chart accompanied by interpretive text that facilitates understanding of the analysis ( Figure 10).

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure10.gif

Figure 10. Evaluation process, image classification by the model, and visualization of the prediction confidence level.

Source: Own elaboration.

Case 3. Interpretive recommendation

In Case 3, the system expands the result with an advanced clinical analysis, which includes risk assessment, diagnostic comparison with the three most likely classes, and a confidence distribution graph ( Figure 11). Subsequently, the user is presented with a detailed clinical panel that includes a description of the lesion, the risk level, medical recommendations, and a cautionary note clarifying that the diagnosis must be confirmed by a dermatologist. This stage reinforces the tool’s function as a clinical diagnosis support system, not as a substitute for professional judgment ( Figure 12).

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure11.gif

Figure 11. Advanced clinical analysis panel with risk assessment and diagnostic comparison.

Source: Own elaboration.

54e9eb1e-e05a-4b3b-88cf-654e0d024665_figure12.gif

Figure 12. Detailed clinical information and medical recommendations generated by the system.

Source: Own elaboration.

Together, the cases described represent the entire workflow of the system, from image loading and validation to clinical interpretation of the final result. This workflow demonstrates the tool’s ability to assist in dermatological analysis in a reproducible, understandable, and medically oriented manner, consolidating its potential application in clinical environments for AI-assisted diagnosis.

Discussion

The implementation of the CNN-based model (ResNet152 with transfer learning) demonstrated a notable improvement in the automatic classification of dermatological conditions, achieving accuracies of 95% for training, 83% for validation, and 92.3% for testing. These results indicate that deep learning models can effectively complement dermatological diagnosis by reducing the variability associated with human clinical judgment. The obtained performance aligns with that reported by Liu et al.,16 who developed a multi-classifier architecture for skin lesion recognition using multi-scale attention mechanisms, achieving an accuracy of 88.2% on the HAM10000 dataset. Similarly, Musthafa et al.17 achieved 97.78% using an optimized CNN, while Shetty et al.18 reported 95.18% with a cross-validation strategy, reinforcing the effectiveness of deep convolutional architectures combined with data augmentation techniques.

The proposed model performed better in classes with higher representativeness, such as Melanocytic Nevi and Basal Cell Carcinoma, which is consistent with the findings of Winkler et al.,19 who demonstrated that CNNs can achieve diagnostic sensitivities comparable to those of dermatologists. This trend has been repeatedly observed in dermatology, where CNN-based models have increased the diagnostic accuracy of specialists, supporting their value as clinical decision-support tools. However, minority classes such as Eczema and Atopic Dermatitis showed higher error rates, confirming the negative impact of data imbalance on model generalization—an issue also highlighted in the systematic reviews of Brinker et al.20

The use of the class_weight parameter partially mitigated the imbalance effects, improving training stability and reducing overfitting. This approach is consistent with that of Shetty et al.,18 who emphasized the importance of rebalancing strategies to enhance model robustness. In addition, integrating a Streamlit-based graphical interface enabled the model to be deployed in an interactive, accessible, and reproducible environment, facilitating real-time image analysis and the visualization of results with interpretable confidence levels. This development aligns with the recommendations of Esteva et al.,21 who underscored the need to adapt CNN-based diagnostic systems to everyday clinical workflows.

The visual outcomes obtained through the confusion matrix and training curves indicated stable and predictable performance, with deviations primarily attributable to the dataset composition. These findings are consistent with those of Liu et al.,16 who demonstrated that biases in data distribution directly affect validation metrics. Overall, the proposed tool demonstrates the feasibility of combining deep learning and accessible visual environments to support dermatological diagnosis.

The main limitations of this study include class imbalance, dependence on a single public dataset, the absence of complementary evaluation metrics such as precision, recall, and F1-score, and the lack of direct clinical validation with real cases.

For future work, we propose expanding the dataset with local samples, performing multicenter external validations, and exploring explainable AI (XAI) approaches to improve model transparency and clinician trust in real-world applications.

Conclusions

This study confirmed the effectiveness of convolutional neural networks (CNNs)—specifically the ResNet152 architecture with transfer learning—as a support tool for the automated diagnosis of dermatological conditions. The model achieved accuracies of 95% in training, 83% in validation, and 92.3% in testing, demonstrating strong classification performance and reproducibility under controlled conditions.

The integration of a Streamlit-based interface enabled the model to be deployed in a practical and interactive environment, facilitating its use by professionals and its potential adaptation to clinical workflows. This development demonstrates the feasibility of linking deep learning models with accessible visual applications, enhancing the interpretability of results and supporting medical decision-making.

Nevertheless, the study has limitations related to class imbalance and reliance on a single public dataset, which may affect the generalization capability of the model in broader clinical contexts. Future research should address these limitations by incorporating more representative datasets, conducting multicenter external validations, and incorporating additional evaluation metrics such as precision, recall, and F1-score for a more comprehensive assessment of model performance.

Overall, this research represents an initial step toward the practical integration of artificial intelligence in dermatology, underscoring the potential of CNN-based tools to complement clinical diagnosis. Under professional supervision, such systems could significantly contribute to improving diagnostic accuracy and efficiency in dermatological care.

Software availability

Software available from: ---

Source code available from: https://github.com/tavoofg/dermosan-dermatology-cnn

Archived source code at time of publication: https://doi.org/10.5281/zenodo.1728110223

License:* https://opensource.org/license/MIT

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 31 Dec 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Fernández-Gutiérrez G, Diego-Calagua A, Leandro-Mendoza A and Pacheco A. DERMOSAN: Development of an interactive software tool in Streamlit for dermatological diagnosis assisted by convolutional neural networks (CNNs) [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1482 (https://doi.org/10.12688/f1000research.172054.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 31 Dec 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.