ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Implementation of Chernobyl optimization algorithm based feature selection approach to predict software defects

[version 1; peer review: 2 approved with reservations, 1 not approved]
PUBLISHED 29 Jul 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Kalinga Institute of Industrial Technology (KIIT) collection.

Abstract

Background

Software defects can have catastrophic consequences. Therefore, fixing these defects is crucial for the evolution of software. Software Defect Prediction (SDP) enables developers to investigate unscramble faults in the inaugural parts of the software progression mechanism. However, SDP faces many challenges, including the high magnitude of attributes in the datasets, which can degrade the prognostic performance of a defect forecasting model. Feature selection (FS), a compelling instrument for overcoming high dimensionality, selects only the relevant and best features while carefully discarding others. Over the years, several meta-heuristic algorithms such as the Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and Ant Colony Optimization (ACO) have been used to develop defect prediction models. However, these models suffer from several drawbacks, such as high cost, local optima trap, lower convergence rate, and higher parameter tuning. To overcome the above shortcomings, this study aims to develop an innovative FS technique, namely, the Chernobyl Optimization Algorithm (FSCOA), to unwrap the most informative features that can produce a precise prediction model while minimizing errors.

Methods

The proposed FSCOA approach mimicked the process of nuclear radiation while attacking humans after an explosion. The proposed FSCOA approach was combined with four widely used classifiers, namely Decision Tree (DT), K-nearest neighbor (KNN), Naive Bayes (NB), and Quantitative Discriminant Analysis (QDA), to determine the finest attributes from the SDP datasets. Furthermore, the accuracy of the recommended FSCOA method is correlated with existing FS techniques, such as FSDE, FSPSO, FSACO, and FSGA. The statistical merit of the proposed measure was verified using Friedman and Holm tests.

Results

The experimental findings showed that the proposed FSCOA approach yielded the best accuracy in most cases and achieved an average rank of 1.75, followed by the other studied FS approaches. Furthermore, the Holm test showed that the p-value was lower than or equivalent to the value of α/(A-i), except for the FSCOA and FSGA and FSCOA and FSACO models.

Conclusion

The experimental findings showed that the prospective FSCOA procedure eclipsed alternative FS techniques with higher accuracy in almost all cases while selecting optimal features.

Keywords

Software Defect Prediction; Feature Selection; Wrapper approach; Chernobyl Disaster Optimizer, Optimization

Introduction

In today’s scenario, humankind needs good-quality and reliable software to help them perform their daily tasks without spending more time and effort. Owing to this immense call for exceptional and dependable software, conducting a rigorous investigation of software under development is crucial. However, the complexity of software increases every passing day, making the overall software development work very challenging.1,2 A fault in software can significantly damage its quality and reliability, leading to more frequent maintenance activities. This can result in higher operational costs for the software, ultimately leading to user dissatisfaction. A software fault can be characterized as the disparity between the actual and predicted behaviors of the software. Software testing empowers developers to identify and rebuild faults. However, conventional testing approaches are costly and time consuming. Hence, it is imperative to detect faults in a software module during the early stages of evolution.3

Software Defect Prediction (SDP) enables developers to expose deficiencies in software components in the early stages of development by employing data analysis and machine learning (ML) approaches. An effective SDP mechanism can lead to the systematic and profitable advancement of high-quality and reliable software products without defects.4 Researchers have suggested several ML-based SDP approaches58 for effectively predicting defects. These methods analyze past data from different stages of development, such as testing data and debugging records, to derive any pattern or trend that can detect any potential defects. The most widely employed ML methods in SDP are DT,9 SVM,10 neural networks,11 logistic regression,12 and NB.13 However, these approaches suffer from many challenges including high dimensionality being one of them.

Feature Selection (FS)14 is a potent mechanism that can be employed to overcome the issue of high dimensionality. FS allows developers to select only relevant features and carefully discard insignificant features. In SDP, FS is a vital step that allows developers to choose the best set of features that can significantly enhance the predictive accuracy of a defect prediction model. The application of FS approaches is essential when dealing with datasets with high dimensionality. Several FS approaches have been implemented in the SDP. FS techniques are broadly classified into three categories: filter techniques,15 wrapper techniques,16 and embedded techniques.17,18 Filter-based FS techniques are autonomous for any training strategy, and apply statistical properties to identify the best traits. Nonetheless, wrapper-based FS procedures select the best characteristics based on the classification accuracy of the prediction model. Embedded-based FS techniques combine feature selection with model training. Existing literature shows that researchers have mainly applied evolution-based algorithms19 and swarm-based algorithms20 for FS purposes.

This exploration aims to boost the classification truthfulness of a defect-foretelling model while minimizing errors. For this purpose, this study uses some of the widely used meta-heuristic algorithms, namely the Genetic Algorithm (GA),21 Particle Swarm Optimization (PSO),22,23 Differential Evolution (DE),24 and Ant Colony Optimization (ACO).25 Although GA has been a proven FS approach,26 it is costly because it computes the optimal features using genetic techniques such as selection, crossover, and mutation over a set of generations. The PSO-based FS approach27 aims to find the optimal traits by emulating the fragment’s movement while probing a search arena with several dimensions. The algorithm adjusts the location and velocity of the fragment by considering singular and group knowledge. However, the PSO-based FS approach sometimes results in a local optima trap in addition to a lower convergence rate. DE-based FS techniques28 compute optimal characteristics by employing operators such as mutation, crossover, and selection on a population of potential solutions over several iterations. However, these approaches require several parameter tunings, making them a tedious choice among researchers. ACO-based FS methods29 determine the excellent characteristics of a search space by implementing the foraging behavior of ants. However, these methods suffer from a slow convergence speed and low accuracy, especially in large datasets.

The limitations of the aforementioned FS approaches motivated us to propose a novel FS approach (FSCOA) inspired by the Chernobyl Disaster Optimizer (CDO).30 The primary objective of the proposed FSCOA approach is to unwrap the most informative features to produce a precise prediction model. It mimics the process of nuclear radiation, which involves the propagation of alpha, beta, and gamma fragments while attacking humans after an explosion. These radiations fly at a very high speed from a high-pressure point (the point of explosion) to a low-pressure point (the standing of the individual place). The proposed algorithm comprises an initial population of the candidate solutions. Furthermore, it computes the gradient descent factor (GDF) for alpha, beta, and gamma fragments when they attack humans. Finally, the optimal solution was achieved by calculating the average of the GDF values over several iterations. The proposed algorithm has advantages such as its ability to deal with convoluted, high-magnitude datasets that are not grounded in local optima, which can be an issue in alternate FS procedures. The primary contributions of this study are as follows.

  • (i) To develop a novel FSCOA approach by applying the CDO, a metaheuristic algorithm

  • (ii) To evaluate the performance of the proposed FSCOA-based fault forecasting model on four different classification algorithms, NB, QDA, DT, and KNN, 12 benchmark NASA software defect datasets were used.

  • (iii) To correlate the performance of the proposed FSCOA approach with several FS approaches such as FSGA, FSPSO, FSDE, and FSACO.

  • (iv) To validate the statistical implications of the proposed FSCOA approach using Friedman and Holm tests.

The experimental outcome shows that the proposed FSCOA was better than the other FS approaches examined in most situations and then became the best-performing FS technique for designating the best array of features. The remainder of this paper is organized as follows. Section 2 discusses the existing literature on FS approaches. Segment 3 elaborates on the proposed FSCOA approach and the detailed methodology used in this study. Segment 4 presents the empirical findings and interpretations. Segment 5 outlines the statistical analysis. Finally, Segment 6 presents the conclusions and scope for prospective work.

Related works

Defect prediction in software modules plays a critical role in creating high-quality and reliable software. SDP permits developers to detect and debug defects in software modules during the prior stage of the software advancement process. Unfortunately, conventional SDP processes face several threats, one of which is the curse of dimensionality. The curse of dimensionality indicates the presence of a large number of attributes in a dataset. Many of these attributes do not make any compelling knowledge, and hence, are treated as noise. Feature selection is a potent tool to tackle the challenge of the curse of dimensionality. FS allows developers to establish the best possible set of traits that can enhance the predictive accuracy of the model by discarding irrelevant traits. However, it is imperative to observe that conventional FS procedures are not only expensive but also time-consuming.31 Recently, the application of ML to SDP has gained considerable traction. Several ML-based SDP approaches have been proposed. This section describes some of these studies as follows.

Das et al.32 proposed a novel FS technique called FSGJO based on the Golden Jackal Optimization (GJO) algorithm. The proposed FSGJO technique was employed on four classifiers, namely, KNN, DT, NB, and QDA, using 12 SDP datasets taken from the PROMISE repository. The authors compared the efficacy of the recommended FSGJO technique with alternative FS techniques, namely, FSDE, FSPSO, FSACO, and FSGA. Based on their experimental findings, the authors observed that the proposed FSGJO technique enhanced the prognostic performance of the model. It was also noted that the prospective FSGJO method was exceptional compared to other studied FS techniques in selecting the optimal set of characteristics. However, the authors mentioned that the proposed FSGJO technique needs its parameters to be tuned.

Khalid et al.33 inspected numerous existing ML methods and optimized ML procedures on three publicly accessible NASA datasets. The authors applied PSO and ensemble approaches in their work and scrutinized the results. The experimental findings revealed that the SVM and optimized SVM outperformed the other models in terms of accuracy. However, this study was conducted using a limited number of datasets. Again, the experimental findings cannot be generalized owing to the lack of additional optimization algorithms.

Kumar and Das34 enforced GA to supervise learning classifiers such as KNN, DT, and NB. Twelve NASA datasets from the PROMISE archive were used. Using accuracy and failure rate as performance metrics, the performance of the proposed model was assessed. Based on their experimental results, the authors asserted that the suggested FSGA technique improved the behavior of the defect forecast model correlated with the scenario in which the FS was not made. However, in this study, the FS approach used only the GA. The effects of alternative optimization methodologies were not investigated.

Thirumoorthy et al.35 suggested a hybrid SDP method based on the TOPSIS and hrbrid Rao algorithms (THRO) to uncover the finest set of traits. The authors used three benchmark NASA SDP datasets to implement their proposed THRO-based FS algorithm on SVM and NB classifiers. The impact of the proposed algorithm was assessed using six metaheuristic FS techniques. The authors noted that the proposed THRO-based FS algorithm enhanced the classification performance of the model and outperformed other studied FS approaches. However, they also noted that this enhanced performance of the proposed method came at the price of increased computational cost.

Batool et al.36 offered a comprehensive and well-organized analysis of the extant literature. They looked at numerous pertinent published publications that employed DM, ML, and DL, among other techniques, for fault prediction. The endeavor was motivated by the need to find answers to research problems stated in the evaluation that might not have been addressed in the works evaluated or called for a different viewpoint. The authors claim that DM and ML techniques, such as DT, NB, SVM, NN, ET, and EA, are frequently employed by SDP. Although they are used less frequently, DL approaches such as CNN, MLP, LSTM, and DNN have also been used by researchers to predict software errors. The authors emphasized the need for larger datasets and the importance of concentrating on using the same methods with combinations of different datasets.

An SDP architecture based on stacked stacking and heterogeneous FS was proposed by Chen et al.37 The two main objectives of this study were to increase SDP accuracy and optimize software testing resource allocation. The method is divided into three steps: feature selection, model creation with a nested-stacking classifier, and evaluation of the predictive behavior of the model. For the experiments, two datasets were used: Kamei and PROMISE. The investigation included both within-project and large-scale cross-project defect prediction. The model’s behavior was illustrated using the AUC and F1-score evaluation metrics. The initial results showed that for the two sets of software failure datasets, the proposed framework performed better in terms of classification than the baseline models. However, the authors pointed out that nested-stacking is not very effective and that the optimal combination of the baseline model was determined via difficult experiments.

Arora and Kaur38 suggested a method that used FS on both origin and destination datasets to assemble a heterogeneous fault prediction (HFP) model to develop an effective forecasting model utilizing supervised training approaches. The writers completed the FS in two phases. They began by selecting features based on their importance. They removed the shared features from the datasets in the next step. An integrated approach was used to select the best characteristics. RFI was used for the FS. According to the suggestion made by Gao et al.,39 the authors selected the top 15% of attributes throughout the FS phase. The proposed framework was applied to two open-source projects, MySQL and Linux, for the supervised ML classifiers, SVM, NB, RF, AdaBoost, DT, and LR. The behavior of the planned model was graded using the Area under the ROC curve (AUC). The authors concluded that the most accurate logistic regression fault prediction is based on the recommended approach. The AUC data demonstrated that the suggested technique accomplished better than the existing Cross Project Fault Prediction (CPFP). However, in this study, other commonly used performance criteria such as accuracy, precision, and recall were not employed to grade the impact of the proposed approach. Once again, only supervised learning algorithms were used in the study, and no optimization algorithms were used.

Anand et al.40 conducted a correlative performance assessment of various FS techniques utilized in SDP. Chi-Square (CS), Correlation Coefficient (CC), Fisher’s Score, Information Gain (IG), Mean Absolute Difference (MAD), and Variance Threshold (VT) are among the filter-based FS approaches used in this investigation. Wrapper-based FS strategies also encourage the use of Backward Feature Elimination (BFE), Exhaustive Feature Elimination (EFE), Forward Feature Elimination (FFE), and Recursive Feature Elimination (RFE) methodologies. RFI and LASSO Regularization are among the embedded FS techniques utilized in this study. The recommended model uses six publicly accessible benchmark NASA datasets for the NB, SVM, DT, and KNN classifiers. The authors used the F1-score, recall, accuracy, and precision as performance evaluation criteria. The authors’ experimental results showed that Fisher’s score behaved more precisely than other FS techniques. However, compared to the no-FS situation, it was found that all FS strategies enhanced the model’s behavior. A drawback of this study is that it neglected to examine the impact of optimization strategies on the FS.

The dynamic re-ranking approach-based WFS technique was introduced by Balogun et al.41 in response to the exorbitant processing expenses of wrapper-based FS (WFS) methods. The recommended technique was constructed using 25 public domain datasets that were extracted from the NASA, AEEEM, PROMISE, and ReLink archives using classifiers such as DT and NB. The findings of the experiment illustrated that the recommended method reduced computing time and enhanced model performance when executing FS. The suggested method was performed using both the FFS and WFS techniques, which is a disadvantage. FFS has variable performance across datasets and classifiers, whereas WFS suffers from stagnation of local optima and high computing costs. Once more, only two supervised classifiers were examined in this work: SVM and K-NN, two more well-known classifiers, were not examined.

Balogun et al.42 proposed an inventive hybrid multifilter wrapper FS arrangement based on rank aggregation to select critical features to address the aforementioned shortcomings. The recommended course of action was implemented in two steps. In the first lap, a multifilter FS mechanism based on rank aggregation was used, which combined the separate rank lists from multifilter methods to build an original, dependable, and non-disjoint rank list. This resolves the filter rank choice issue. In the second lap, the upgraded wrapper FS approach, which was predicated on dynamic re-ranking, was used once more to preprocess the accumulated ranked attributes. The competence of the recommended method is illustrated using NB and DT classifiers on benchmark software fault datasets. The tests used accuracy, area under the curve (AUC), and F-measure values as evaluation criteria. The authors used their findings to concentrate on the issues of filter rank choice and local optima stagnation in HFS, demonstrating the suggested method’s ingenuity in selecting the best characteristics while enduring or boosting the impact of the forecasting models. They concluded that applying the recommended technique significantly improves the behavior of the model. However, the model was limited to only two classifiers to achieve satisfactory results. Consequently, the potential of extrapolating the results to alternative classifiers has not been explored.

Alsghaier and Akour43 presented an SDP model by fusing the GA, SVM, and PSO. Three stages were implemented: GA-SVM for GA integration, PSO-SVM for PSO integration, and GAPSO_SVM for the reciprocal iteration-based integration of GA-SVM and PSO-SVM. During the experimentation phase, 24 benchmark SDP datasets (12 NASA MDP and 12 open-source Java applications) were subjected to the proposed model using the SVM classifier. To validate the theoretical model, experiments were conducted using the WEKA Tool and MATLAB 2015. The impact of the developed approach was assessed using evaluation metrics, such as accuracy, recall, precision, F-measure, specificity, error rate, and standard deviation. The results of the experiment showed that combining the GA with SVM and PSO had a beneficial effect on the model and enhanced its performance when applied to both small- and large-scale datasets. However, the precision metric was insufficient to appraise the suggested procedures.

Alsghaier and Akour44 built on their earlier work43 by combining GA, SVM, and Whale Optimization Algorithm (WOA) to forecast defects. The remainder of the experimental configuration remained the same as in a previous study.43 Through experimental data, the researchers discovered that the behavior of the defect forecast model was improved for both large-scale and small-scale datasets when the GA was integrated with SVM and WOA. For the datasets under study, WA-SVM performed more accurately than GAWA-SVM, and GAWA-SVM produced the worst outcomes. Again, the proposed method outperformed SVM for the NASA MDP and open-source Java projects in terms of SD scores. This illustrates how combining SVM with optimization techniques enhances the prediction performance. The NASA, GA-SVM, and GAWA-SVM datasets produced the best outcome in terms of specificity. This proved that the GA-SVM and GAWA-SVM procedures are appropriate for software defect predictions when enforced on an enormous dataset.

Balogun et al.45 used NASA datasets from the PROMISE archives to thoroughly evaluate the FSS algorithms on NB, DT, LR, and KNN. Their findings imply that the studied FS techniques enhanced the performance of the system. Information Gain, one of the FFR techniques, demonstrated the best results. Consistency Feature Subset Selection (CFSS), which is based on the Best First Search in FSS methods, has the greatest impact on forecasting models. However, there were variations in the performances of the classifiers’ and datasets.’ Scientists have also found that models constructed using FFR-based techniques are more stable than those constructed using FSS-based approaches. This study focused only on FFS procedures, and the effects of the WFS techniques were not investigated in detail.

All of the previously stated FS approaches, whether supervised or unsupervised, have disadvantages that significantly impact the model’s performance, including (i) high cost, (ii) entrapment in local optima, (iii) low convergence rate, and (iv) fine-tuning of excessively many parameters. The primary drawback of the previously stated FS techniques is the need to modify the regulating parameters accurately while choosing ideal characteristics. These shortcomings motivated us to propose a novel FS technique (FSCOA) that draws inspiration from the Chernobyl Disaster Optimizer (CDO). The 1986 nuclear reactor core outburst in Chernobyl served as an impetus for the development of the CDO meta-heuristic algorithm. The process of nuclear radiation, in which alpha, beta, and gamma fragments propagate and damage humans following an explosion, is replicated by CDO. From the high-pressure point (explosion site) to the low-pressure point (individual standing standing), the above-mentioned radiations travel at an extremely rapid speed. The algorithm comprises an initial population of potential solutions. Moreover, it calculates the alpha, beta, and gamma fragment gradient descent factors (GDF) during human attacks. Determining the average of these GDF values over a number of iterations yields the best result.

Feature selection using Chernobyl Optimization Algorithm

Feature selection determines the crucial attributes that have the greatest impact on the desired variable, which helps increase machine learning model accuracy, reduce computing costs, and reduce the risk of overfitting. The mechanism of selecting the best features consists of three steps: (1) creating a set of subgroups of attributes; (2) assessing and comparing the adequacy of these subgroups to determine which subgroup is the best or until the abort standards are met; and (3) computing the outcome using only the best features. It is difficult to determine which parts to classify. The 1986 Chernobyl nuclear reactor catastrophe46 is recognized as one of the lowest nuclear disasters in the modern human past, in terms of both cost and casualties. Inspired by the Chernobyl nuclear reactor core eruption, the Chernobyl Disaster Optimization (CDO)30 is a meta-heuristic optimization technique. In order to choose the most appropriate subset of characteristics for classification, a novel FS approach using Chernobyl Optimization Algorithm (FSCOA) is therefore proposed in order to address the aforementioned problem. Figure 153 shows the blueprint for the proposed FSCOA method.

61f1740d-a711-41b5-bb4b-e6eae0f14430_figure1.gif

Figure 1. Blueprint of the suggested FSCOA methodology.

First, the selection of the relevant SDP datasets is crucial. Following the selection of the datasets, an in-depth examination of the datasets was carried out to determine any missing, inconsistent, or categorical data. It became apparent that there were no missing data in the datasets. Nevertheless, a few datasets contained categorical data. The data were categorized prior to the generation of the numbers. Furthermore, the original feature value, which originally ranged from 0 to 1, was normalized. Subsequently, an 80:20 split between the training and testing datasets was created for each normalized dataset. To develop and investigate the model, the two preeminent criteria are the population size and maximum number of iterations. Higher values will improve the performance of the model, but they also lengthen the computation time. In this study, the population size and maximum tally of the iterations were set to 30 and 200, respectively. By applying the recommended FSCOA methodology, four supervised learning classifiers (DT, KNN, NB, and QDA) were used to construct the model using the optimal features that were chosen. The best predictive classifier was then determined by comparing the accuracy of the proposed FSCOA approach with that of the other FS models under study.

This study suggests a novel FSCOA technique to select the first-rate subgroup of attributes for categorization. The primary intent of the suggested technique is to identify the best attribute combination that will lower the fitness of the model. Figure 253 shows a complete flow diagram of the proposed FSCOA technique.

61f1740d-a711-41b5-bb4b-e6eae0f14430_figure2.gif

Figure 2. Flow-diagram of recommended FSCOA approach.

Initializing the criteria, such as population size (M), problem dimension (F), lower bound (LowBound), and upper bound (UppBound), is the first step of the procedure. Subsequently, a random binary population of M fragments with dimension F, where Z=[Z1,Z2,Z3,,ZM] is the total number of features. Zi=[Z1,Z2,Z3,,ZM] is the ith fragment location in the F dimension feature space, where i=1,2,3,,M is the specimen proportion, and Zi,f is the ith fragment standing of the fth trait of the population. Many classification techniques, including DT, KNN, NB, and QDA for fitness (error) computation, have been considered for the examination of randomly selected characteristics. The FS algorithm aims to select the best subset of ideal features that may reduce the fitness of the learning algorithm. The error (Errit) is estimated as the disparity between the estimated outcome (EOit) and actual outcome (AOit). Eq. (1) can be used to describe this phenomenon.

(1)
Errit=AOitEOit

By distributing the aggregate of the total errors by the entire count of instances in the testing data, the fitness (FitValueit) of the learning algorithm was calculated. This is characterized by Eq. (2).

(2)
FitValueit=i=1pErritp

Here, i=1,2,..,p and p represent the tally of the instances in the trial data, t represents the current iteration.

The transfer function depicted in Eq. (3) was employed to transform the initial fragment standings into a binary equivalent.

(3)
TF=11+(exp(10×(Zi,f0.5)))

The proposed FSCOA approach employs the CDO algorithm to determine the optimal features for a given dataset. In CDO, different types of emissions are released from the nuclei as a result of radioactivity caused by nuclear instability. The most prevalent types of these emissions are alpha, beta, and gamma fragments. These fragments, which are very dangerous to people, fly from a high-pressure point (the point of explosion) to a low-pressure point (the standing of individual standing). When a human is attached to a CDO following a nuclear explosion, it simulates the effects of radioactive decay. The primary processes of nuclear explosion and human attachment require the use of gamma, beta, and alpha fragments. Humans were most likely to be on foot when they were attacked. Human walking speed can be enhanced, and it can be estimated to be between 0 and 3 miles per hour.49 Based on this, Eq. (4) can be used to model linear reduction at this speed.

(4)
WalkSpeedhuman=3t((3)/max_iter)

Alpha fragment

The gradient descent factor (GDFα) of the alpha fragment while threatening humans can be computed using Eq. (5).

(5)
GDFα=0.25×(POSα(t)PROPα×Dα)

Here, POSα(t) is the prevailing standing of alpha fragments, PROPα represents the dispersion of alpha fragments and can be calculated using Eq. (6); Dα is the discrepancy between the individual standing and standing of alpha fragments, which can be determined using Eq. (8).

(6)
PROPα=π×rad×rad0.25×Speedα(WalkSpeedhuman×rand())

Here, rad is a random value between 0 and 1, Speedα is the speed of alpha fragments that can be in the range of 1–16,000 kmps. This can be normalized using Eq. (7)

(7)
Speedα=log(rand(1:16000))
(8)
Dα=|Areaα×POSα(t)AvgT(t)|

Here, Areaγ is the propagation area of gamma fragments that can be calculated as πradrad where rad is a random value between 0 and 1, AvgT is the average of the total standings that can be determined using Eq. (17).

Beta fragment

Eq. (9) can be used to determine the gradient descent factor (GDFβ) of a beta fragment assaulting a human.

(9)
GDFβ=0.5×(POSβ(t)PROPβ×Dβ)

Here, POSβ(t) is the current standing of beta fragments; PROPβ represents the propagation of beta fragments and can be calculated using Eq. (10); Dβ is the discrepancy between the human standing and the beta fragment’s standing, which can be determined using Eq. (12).

(10)
PROPβ=π×rad×rad0.5×Speedβ(WalkSpeedhuman×rand())

Here, rad is a random value between 0 and 1, Speedβ is the speed of beta fragments that can be in the range of 1–270,000 kmps. This can be normalized using Eq. (11)

(11)
Speedβ=log(rand(1:270000))
(12)
Dβ=|Areaβ×POSβ(t)AvgT(t)|

Here, Areaβ is the propagation area of the beta fragment and can be calculated as πradrad where rad is a random value between 0 and 1. AvgT is the average of the total standings that can be computed using Eq. (17).

Gamma fragments

The gradient descent factor (GDFγ) of the gamma fragment while making an assault on humans can be computed using Eq. (13).

(13)
GDFγ=(POSγ(t)PROPγ×Dγ)

Here, POSγ(t) is the prevailing standing of gamma fragments, PROPγ represents the dispersion of gamma fragments and can be calculated using Eq. (14); Dγ is the discrepancy between the standing of the human and the standing of gamma fragments, which can be determined using Eq. (16).

(14)
PROPγ=π×rad×radSpeedγ(WalkSpeedhuman×rand())

Here, rad is a random value between 0 and 1, Speedγ is the speed of the gamma fragment in the range of 1 to 300,000 kmps. This can be normalized using Eq. (15)

(15)
Speedγ=log(rand(1:300000))
(16)
Dγ=|Areaγ×POSγ(t)AvgT(t)|

Here, Areaγ is the propagation area of gamma fragments that can be calculated as πradrad where rad is a random value between 0 and 1, AvgT is the average of the total standings that can be determined using Eq. (17).

(17)
AvgT=(GDFα+GDFβ+GDFγ3)

Finally, Algorithm 1 provides a summary of the entire proposed FSCOA process.

Proposed FSCOA approach

Algorithm 1.

  • 1. Initialize Populace Size (M), Dimension (F), Lower Bound (LowBound), UpperBound (UppBound), Maximum Iteration (max_iter)

  • 2. Generate the binary feature subset Zi randomly

  • 3. Initialize the alpha (POSα), beta (POSβ), and gamma (POSγ) standings

  • 4. while (t<max_iter) do{

  • 5.   for i=1: M do

  • 6.    forj=1toF do

  • 7.     The values of the initial standing of the fragments are converted into their corresponding binary values using Eq. (3).

  • 8.     Compute the fitness value (FitValue)for the alpha, beta, and gamma fragments using Eq. (2)

  • 9.     if (FitValue<score)

  • 10. Score=FitValue

  • 11.        Update POSα

  • 12.     endif

  • 13.     if (FitValue>score)and(FitValue<βscore)

  • 14. βScore=FitValue

  • 15.        Update POSβ

  • 16.     endif

  • 17.     if (FitValue>score)and(FitValue>βscore)and(FitValue<γscore)

  • 18. γScore=FitValue

  • 19.        Update POSγ

  • 20.     endif

  • 21.    end for

  • 22.   end for

  • 23. Compute human walking speed (WalkSpeedhuman) using Eq. (4)

  • 24. Compute the speed of alpha (Speedα), beta (Speedβ), and gamma (Speedγ) fragments using Eq. (7), Eq. (11), and Eq. (15), respectively.

  • 25. for i=1: M do

  • 26.    for j=1: F do

  • 27.     Determine GDFα using Eq. (5)

  • 28.     Determine GDFβ using Eq. (9)

  • 29.     Determine GDFγ using Eq. (13)

  • 30.     Update Zi using average of total standings using Eq. (17)

  • 31.    end for

  • 32. end for

  • 33. t=t+1

  • 34. } //end of while loop

  • 35. Return finest solution, Zi

  • 36. end procedure

Result analysis

This section deliberates on the empirical findings of this research. The persuasiveness of the proposed FSCOA approach was graded by employing 12 publicly benchmarked NASA software defect datasets extracted from the PROMISE archive.48 KC1, KC3, CM1, JM1, MC1, MC2, MW1, PC1, PC2, PC3, PC4, and PC5 were the datasets. First, an in-depth examination of the datasets was performed to identify missing, inconsistent, and categorical data. It became apparent that there were no missing data in the datasets. Nevertheless, a few datasets contained categorical data. The data were categorized prior to the generation of the numbers. Again, we noticed that the datasets comprised of continuous data. The datasets were altered using the min–max normalization method49 with the goal of overcoming this problem. The original feature value, which originally ranged from zero to one, was transformed using this technique. Subsequently, an 80:20 split between the training and testing datasets was created for each normalized dataset. Extensive information regarding the datasets enforced in this exploration is shown in Table 1.

Table 1. Specifics of the enforced NASA datasets.

DatasetsNo. of instancesNo. of featuresNon-susceptible classes (SC)Susceptible classes (SC)Susceptible (%)
PC170538644618.7
PC274537729162.1
PC310773894313412.4
PC4128738111017713.8
PC5171139124047127.5
CM1327382854212.8
JM17782226110167221.5
KC111832286931426.5
KC3194401583618.5
MC11988391942462.3
MC212540814435.2
MW1253382262710.6

The configuration of the computer on which the experiments were administered was as follows: Intel Core i5-6200 CPU with clock rate 2.40 GHz and 8 GB RAM. The aforementioned techniques were employed in a Python 3 environment using the Jupyter notebook. First, the input dataset was uploaded using Pandas. The datasets were altered using the min-max normalization method.49 Using train_test_split from sklearn.model_selection, the dataset was partitioned into training and testing datasets at a ratio of 80:20. Populace size and the highest number of iterations were the two major criteria for developing and validating the model. The model will provide superior outcomes with higher values, but it will also increase computing time. In this investigation, the population size and highest number of iterations were set to 30 and 200, respectively. Four supervised learning classifiers, DT, KNN, NB, and QDA, were used to evaluate five FS approaches: FSDE, FSPSO, FSGA, FSACO, and the suggested FSCOA. The fitness error plots for the suggested FSCOA approach and other FS strategies utilizing the examined classifiers, DT, KNN, NB, and QDA, were obtained using matplotlib.pyplot. In this study, accuracy, a frequently applied performance indicator metric, was used for assessment purposes. Accuracy can be expressed as a simple proportion of the total instances of instances that were correctly classified. Eq. (18) is used to calculate it from the confusion matrix, as follows:

(18)
Accuracy=TP+TNTP+TN+FP+FN

Here, TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.

The performance of the recommended FSCOA algorithm is evaluated against several FS procedures such as FSDE, FSPSO, FSGA, and FSACO in terms of classification accuracy and the count of selected attributes on 12 datasets studied in this research work. Because of the stochastic character of the previously mentioned techniques, we carried out ten runs of the trials to ensure that the performance of each procedure remained persistent, with an initial random population. The median accuracy of the proposed FSCOA, along with the other studied FS approaches, is listed in Table 2.54

Table 2. Accuracy percentage and number of features selected by four classifiers for twelve datasets.

Sl. No.DatasetsFS Algorithms/ClassifiersKNNDTNBQDAAttributes selected
1KC1Without FS69.6272.1574.2674.2622
FSDE76.4677.0977.2278.18.2
FSPSO74.6473.1276.1276.278.3
FSGA76.4776.0377.2277.939.4
FSACO77.6976.8577.4777.854.5
FSCOA77.1375.3277.3478.278.2
2KC3Without FS74.3676.9266.6776.9240
FSDE809076.9286.9217.7
FSPSO76.1581.0371.5479.7416.9
FSGA79.2387.6976.1587.6918.8
FSACO86.9285.1379.4986.418.9
FSCOA82.0586.4180.2586.4112.77
3JM1Without FS73.3569.9478.9975.8522
FSDE77.1878.7379.8579.724.9
FSPSO75.0772.9479.1679.058.9
FSGA76.3673.2979.6279.8310.3
FSACO79.2679.4979.8979.793
FSCOA79.1379.6679.9179.823.75
4CM1Without FS75.7680.377.2783.3338
FSDE86.8289.783.4888.4818.2
FSPSO83.3383.1881.978514.5
FSGA85.388.9483.1888.4817.8
FSACO87.4287.2784.5588.1812.1
FSCOA88.3383.6484.3988.7915.4
5MC1Without FS96.4897.7495.7397.4939
FSDE97.7498.5797.7197.7419.2
FSPSO97.5698.4996.3197.4912.4
FSGA97.5998.6797.7197.7419.2
FSACO98.0298.3497.7497.7613.4
FSCOA97.9498.5997.7197.9615.7
6MC2Without FS7668928440
FSDE87.690.895.69618.4
FSPSO8075.292.888.417.2
FSGA85.289.693.295.218.4
FSACO89.2869695.27.2
FSCOA94.482.4969612.9
7PC1Without FS89.3688.6587.2386.5238
FSDE93.7695.0491.2193.2619.1
FSPSO90.8592.0689.6589.7216.2
FSGA93.4895.0490.6493.4818.9
FSACO93.9793.9792.6392.9110.8
FSCOA94.492.7792.7793.8317.9
8PC2Without FS96.6495.393.9697.3237
FSDE97.7998.6696.9897.5814.7
FSPSO97.4596.5195.8497.3815.2
FSGA97.6598.3296.3197.4817.1
FSACO97.8997.2597.2298.1210.7
FSCOA97.9997.8597.3298.1912.52
9PC3Without FS82.4178.768.9862.0338
FSDE86.3486.3986.8586.4416.6
FSPSO84.7782.9280.8383.3813.5
FSGA85.9386.5786.8186.7617.2
FSACO86.7184.4487.0886.8211.6
FSCOA87.0485.8387.1886.913.8
10MW1Without FS78.4374.5176.4780.3938
FSDE87.2587.8483.5388.6313.5
FSPSO84.7182.7578.8284.3112.9
FSGA85.6987.0682.1686.6717.2
FSACO86.6785.2987.2590.398.7
FSCOA87.4585.6889.0289.810.7
11PC4Without FS84.4991.0986.8247.6738
FSDE90.1193.4591.5992.7117.6
FSPSO86.6392.6489.1186.9814.4
FSGA87.693.5391.7492.4918.6
FSACO91.1692.491.491.8214.2
FSCOA91.7492.9592.3392.7515.5
12PC5Without FS67.0672.5970.5569.3939
FSDE76.3377.7371.5772.5718.4
FSPSO71.9873.5370.8270.5915.7
FSGA75.6377.8171.4672.9219.3
FSACO76.8575.1672.1971.1114.8
FSCOA78.6377.2372.4572.71116.4

The median accuracy of several classifiers forced on diverse datasets, both with and without feature selection, is shown in the table. The table also displays the average tally of the attributes selected by the respective FS approach. The classifiers were evaluated using a range of datasets and previously discussed FS techniques. The experimental findings showed that the suggested FSCOA technique exceeded the other studied FS procedures in the majority of instances. For the majority of the datasets, with the exception of KC1, KC3, JM1, and MC1, the suggested FSCOA performed best when combined with KNN. With the exception of KC1, MC1, and CM1, the bulk of the datasets showed that the recommended FSCOA worked best when paired with NB. The majority of the datasets demonstrated that the suggested FSCOA performed best when combined with QDA, with the exception of KC3, MW1, and PC5. With the exception of JM1, the majority of the datasets showed that the previously researched FS approaches outperformed the suggested FSCOA strategy when used in conjunction with DT. Similarly, applying the proposed FSCOA technique to the JM1 dataset with all the analyzed classifiers, except KNN, yielded the highest accuracy. Furthermore, the bulk of the datasets yielded the best accuracy for all examined classifiers, with the exception of DT. It is crucial to remember that the suggested FSCOA technique could only provide the best prediction using QDA and NB classifiers for the KC1 and KC3 datasets, respectively.

Figures 3 through 653 display, respectively, the fitness error plots for the suggested FSCOA approach and other FS strategies utilizing the examined classifiers DT, KNN, NB, and QDA. Error plots of all 12 datasets are included in each graph. The error plots show that in most cases, the error plot of the suggested FSCOA is smaller than those of the other FS approaches employed in this investigation. Furthermore, the error plot of the suggested FSCOA methodology matches that of the various existing FS methods. However, the error plot of the suggested FSCOA approach exceeds that of the other FS techniques that have been evaluated under certain circumstances.

61f1740d-a711-41b5-bb4b-e6eae0f14430_figure3.gif

Figure 3. DT fitness error plot.

Figure 353 shows that in most datasets, the fitness error plot of the proposed FSCOA approach using the DT classifier is smaller than that of the other FS models, with the exception of CM1, MC2, and KC3. However, for the KC3 dataset, the error plot overlapped with that of the FSDE after 190 iterations. The error plot for the CM1 dataset is located above the FSDE and FSACO. Furthermore, the plot for the MC2 dataset is above the FSDE.

The fitness error plot of the proposed FSCOA approach with the KNN classifier is lower than that of the previous FS models for most datasets, as shown in Figure 4,53 with the exception of KC1, KC3, MC1, and PC2. The error plot after 115 iterations corresponds to FSDE and FSACO for the PC2 dataset, but it is above the FSACO model for the KC1, KC3, and MC1 datasets.

61f1740d-a711-41b5-bb4b-e6eae0f14430_figure4.gif

Figure 4. KNN fitness error plot.

As shown in Figure 5,53 the fitness error plot of the suggested FSCOA technique with the NB classifier was lower in the majority of datasets than that of the prior FS models, with the exception of CM1, KC1, MC1, PC2, MC2, and PC3. After 75 iterations, the error plot for the MC2 dataset matches that of FSACO. For the PC3 dataset, the error plot of the suggested FSCOA method matches that of FSACO after 175 iterations. However, for datasets CM1, KC1, and MC1, the error plot was above that of the FSACO model. Moreover, for PC2 and KC1, the plot was above the FSDE.

61f1740d-a711-41b5-bb4b-e6eae0f14430_figure5.gif

Figure 5. NB fitness error plot.

Figure 653 illustrates that for most datasets (except CM1, JM1, KC3, MW1, PC2, and PC3), the fitness error plot of the proposed FSCOA approach with the NB classifier is smaller than that of the previous FS models. Following 180 iterations, the CM1 dataset’s error plot aligns with that of the FSACO dataset. The error plot of the proposed FSCOA algorithm matches that of FSGA after 175 iterations for the JM1 dataset. The error plot is above that of the FSACO model for datasets MW1, PC2, and PC3. The graphic is also above FSGA and FSDE for the KC3 dataset. The error plot of the proposed FSCOA approach is above that of FSGA for the PC3 dataset.

61f1740d-a711-41b5-bb4b-e6eae0f14430_figure6.gif

Figure 6. QDA fitness error plot.

The FS algorithms employed in this study use several hyperparameters. For 200 iterations, an examination was performed with a population size of 30. The crossover rate (CR) in FSGA and FSDE was maintained at 0.8 and 0.9, respectively. For FSGA, the mutation rate (MR) is 0.01. In the FSDE, the scaling factor (SF) was set to 0.8. In FSPSO, the maximum inertia weight (IWmax) and the minimum inertia weight (IWmin) have been fixed as 0.9 and 0.4 accordingly. Two were chosen as the acceleration factors. The fixed values of alpha (α), rho (ρ), and beta (β) in FSACO were 1, 0.2, and 0.1, respectively. The governing criterion speeds for alpha (Speedα), beta (Speedβ), and gamma (Speedγ) were adjusted appropriately for FSCOA using the Rand function. In addition, the radiation propagation radius (rad) was similarly fixed between 0 and 1 using the rand function.

Statistical analysis

This section provides extensive statistical scrutiny of the empirical findings of this work. Statistical analysis50 is a popular method for quantifying, examining, evaluating, and drawing conclusions from data. Tests classified as parametric or non-parametric were the two breeds used for statistical analysis. A type of statistical analysis, known as parametric statistical testing, assumes that the data under study conform to a specific probability distribution, most frequently a normal distribution. Several assumptions, such as the independence of observations, homogeneity of variance, and normality, must hold true to employ parametric tests. Ensuring that the assumptions are met is essential when conducting a parametric test, because failing to do so may result in erroneous results and invalid conclusions. Therefore, it is crucial to confirm these hypotheses in advance and, if necessary, to use non-parametric validation. A type of statistical scrutiny known as non-parametric statistical testing does not depend on a specific probability distribution hypothesis for the data under study. Non-parametric validations, on the other hand, are more broadly applicable and resilient to assumption violations than parametric tests because they depend on the hierarchy or placement of the data. However, when the presumptions of the parametric validations are satisfied, they might be less effective than the latter. It is essential to choose a statistical validation suitable for an exploration topic and the properties of the data being examined. In this study, the Friedman Test,51 a non-parametric rank-based test, has been carefully considered. Based on the effectiveness of the classification, each model associated with the trial was ranked according to the Friedman test. The lowest count correlates with the greatest slot and the largest count correlates with the smallest slot.

To begin with, Eq. (19) was employed to determine the average rank (AverageRankModels) of all graded models (FSDE, FSPSO, FSGA, FSACO, FSCOA, and Without FS), in addition to a number of classification models (KNN, DT, NB, and QDA). Table 354 presents an illustration these findings.

(19)
AverageRankModels=RankModelsTotal number of Models(A)

Table 3. For twelve NASA datasets, the average rank of all FS algorithms (Friedman Rank).

Sl. No.DatasetsFS Algorithms/ClassifiersKNNDTNBQDAAverageRankModels
1KC1Without FS69.62 (6) 72.15 (6)74.26 (5)74.26 (6)5.75
FSDE76.46 (4)77.09 (1)77.22 (3)78.10 (2)2.5
FSPSO74.64 (5)73.12 (5)76.12 (4)76.27 (5)4.75
FSGA76.47 (3)76.03 (3)77.22 (3)77.93 (3)3
FSACO77.69 (1)76.85 (2)77.47 (1)77.85 (4)2
FSCOA77.13 (2)75.32 (4)77.34 (2)78.27 (1)2.25
2KC3Without FS74.36 (6)76.92 (6)66.67 (6)76.92 (5)5.75
FSDE80 (3)90 (1)76.92 (3)86.92 (2)2.25
FSPSO76.15 (5)81.03 (5)71.54 (5)79.74 (4)4.75
FSGA79.23 (4)87.69 (2)76.15 (4)87.69 (1)2.75
FSACO86.92 (1)85.13 (4)79.49 (2)86.41 (3)2.5
FSCOA82.05 (2)86.41 (3)80.25 (1)86.41 (3)2.25
3JM1Without FS73.35 (6)69.94 (6)78.99 (6)75.85 (6)6
FSDE77.18 (3)78.73 (3)79.85 (3)79.72 (4)3.25
FSPSO75.07 (5)72.94 (5)79.16 (5)79.05 (5)5
FSGA76.36 (4)73.29 (4)79.62 (4)79.83 (1)3.25
FSACO79.26 (1)79.49 (2)79.89 (2)79.79 (3)2
FSCOA79.13 (2)79.66 (1)79.91 (1)79.82 (2)1.5
4CM1Without FS75.76 (6)80.30 (6)77.27 (6)83.33 (5)5.75
FSDE86.82 (3)89.70 (1)83.48 (3)88.48 (2)2.25
FSPSO83.33 (5)83.18 (5)81.97 (5)85 (4)4.75
FSGA85.30 (4)88.94 (2)83.18 (4)88.48 (2)3
FSACO87.42 (2)87.27 (3)84.55 (1)88.18 (3)2.25
FSCOA88.33 (1)83.64 (4)84.39 (2)88.79 (1)2
5MC1Without FS96.48 (6)97.74 (6)95.73 (4)97.49 (5)5.25
FSDE97.74 (3)98.57 (3)97.71 (2)97.74 (3)2.75
FSPSO97.56 (5)98.49 (4)96.31 (3)97.49 (4)4
FSGA97.59 (4)98.67 (1)97.71 (2)97.74 (3)2.5
FSACO98.02 (1)98.34 (5)97.74 (1)97.76 (2)2.25
FSCOA97.94 (2)98.59 (2)97.71 (2)97.96 (1)1.75
6MC2Without FS76 (6)68 (6)92 (5)84 (4)5.25
FSDE87.6 (3)90.8 (1)95.6 (2)96 (1)1.75
FSPSO80 (5)75.2 (5)92.8 (4)88.4 (3)4.25
FSGA85.2 (4)89.6 (2)93.2 (3)95.2 (2)2.75
FSACO89.2 (2)86 (3)96 (1)95.2 (2)2
FSCOA94.4 (1)82.4 (4)96 (1)96 (1)1.75
7PC1Without FS89.36 (6)88.65 (5)87.23 (6)86.52 (6)5.75
FSDE93.76 (3)95.04 (1)91.21 (3)93.26 (3)2.5
FSPSO90.85 (5)92.06 (4)89.65 (5)89.72 (5)4.75
FSGA93.48 (4)95.04 (1)90.64 (4)93.48 (2)2.75
FSACO93.97 (2)93.97 (2)92.63 (2)92.91 (4)2.5
FSCOA94.40 (1)92.77 (3)92.77 (1)93.83 (1)1.5
8PC2Without FS96.64 (6)95.30 (6)93.96 (6)97.32 (6)6
FSDE97.79 (3)98.66 (1)96.98 (3)97.58 (3)2.5
FSPSO97.45 (5)96.51 (5)95.84 (5)97.38 (5)5
FSGA97.65 (4)98.32 (2)96.31 (4)97.48 (4)3.5
FSACO97.89 (2)97.25 (4)97.22 (2)98.12 (2)2.5
FSCOA97.99 (1)97.85 (3)97.32 (1)98.19 (1)1.5
9PC3Without FS82.41 (6)78.70 (6)68.98 (6)62.03 (6)6
FSDE86.34 (3)86.39 (2)86.85 (3)86.44 (4)3
FSPSO84.77 (5)82.92 (5)80.83 (5)83.38 (5)5
FSGA85.93 (4)86.57 (1)86.81 (4)86.76 (3)3
FSACO86.71 (2)84.44 (4)87.08 (2)86.82 (2)2.5
FSCOA87.04 (1)85.83 (3)87.18 (1)86.90 (1)1.5
10MW1Without FS78.43 (6)74.51 (6)76.47 (6)80.39 (6)6
FSDE87.25 (2)87.84 (1)83.53 (3)88.63 (3)2.25
FSPSO84.71 (5)82.75 (5)78.82 (5)84.31 (5)5
FSGA85.69 (4)87.06 (2)82.16 (4)86.67 (4)3.5
FSACO86.67 (3)85.29 (4)87.25 (2)90.39 (1)2.5
FSCOA87.45 (1)85.68 (3)89.02 (1)89.80 (2)1.75
11PC4Without FS84.49 (6)91.09 (6)86.82 (6)47.67 (6)6
FSDE90.11 (3)93.45 (2)91.59 (3)92.71 (2)2.5
FSPSO86.63 (5)92.64 (4)89.11 (5)86.98 (5)4.75
FSGA87.60 (4)93.53 (1)91.74 (2)92.49 (3)2.5
FSACO91.16 (2)92.40 (5)91.40 (4)91.82 (4)3.75
FSCOA91.74 (1)92.95 (3)92.33 (1)92.75 (1)1.5
12PC5Without FS67.06 (6)72.59 (6)70.55 (6)69.39 (6)6
FSDE76.33 (3)77.73 (2)71.57 (3)72.57 (3)2.75
FSPSO71.98 (5)73.53 (5)70.82 (5)70.59 (5)5
FSGA75.63 (4)77.81 (1)71.46 (4)72.92 (1)2.5
FSACO76.85 (2)75.16 (4)72.19 (2)71.11 (4)3
FSCOA78.63 (1)77.23 (3)72.45 (1)72.71 (2)1.75

Table 454 summarizes the findings of grading the median of all ranks of diversified amalgamation setups (FSDE, FSPSO, FSGA, FSACO, FSCOA, and Without FS) for all the datasets used in Eq. (20).

(20)
AverageRankDatasets=AverageRankModelsTotal number of Datasets(B)

Table 4. AvgRank of all FS configurations.

Sl. No.DatasetsWithout FSFSDEFSPSOFSGAFSACOFSCOA
1KC15.752.54.75322.25
2KC35.752.254.752.752.52.25
3JM163.2553.2521.5
4CM15.752.254.7532.252
5MC15.252.7542.52.251.75
6MC25.251.754.252.7521.75
7PC15.752.54.752.752.51.5
8PC262.553.52.51.5
9PC363532.51.5
10MW162.2553.52.51.75
11PC462.54.752.53.751.5
12PC562.7552.531.75
Average5.82.524.752.922.481.75
RankDatasetsAvgRank6AvgRank3AvgRank5AvgRank4AvgRank2AvgRank1

The following are the average ranks for all the correlated configurations included in this observation. {AvgRank1=1.75,AvgRank2=2.48,AvgRank3=2.52,AvgRank4=2.92,AvgRank5=4.75,AvgRank6=5.8}. The median ranks of the models were employed to gauge the XF statistics, referred to as XF2 using Eq. (21) and a presented value of 23.29.

(21)
XF2=12×BA×(A+1)×[i=16(AvgRank(i))2A×(A+1)24]

Twelve datasets (B=12) and six models (A=6) were considered in this experiment. The Friedman statistic (FF) value was computed using Eq. (22) using (B − 1) and XF2.

(22)
FF=(B1)×XF2B×(A1)XF2

The value of (FF) estimated to be 6.978. The critical value was determined as 2.383 by employing the degrees of freedom as (6 − 1 = 5) × (12 − 1 = 11) and (6 − 1 = 5), with α = 0.05, as the significance level. Given that the critical value of 2.383 is smaller than that of the Friedman statistic (FF = 6.978), the null hypothesis is rejected. It also determines whether to adopt an alternate theory. This implies that two or more configurations are distinct from one another. The Holm method is usually employed to investigate the Post Hoc test after the null hypothesis is jilted and the substitute hypothesis is endorsed. By employing the Holm technique, the pvalue and zvalue were applied to assess how well each distinct model performed relative to the other models.52 Eq. (23) was used to obtain the value of z. The zvalue and normal distribution table were used to calculate the value of p.

(23)
z=AvgRank(i)AvgRank(j)A×(A+1)6×B

In this experiment, the terminologies B, A, and z represent the number of datasets, number of configurations employed in this investigation, and value of z, respectively. The terms AvgRank(i) and AvgRank(j) represents the average rank of ith and jth model, respectively. The pvalue, zvalue, and α/(Ai) of the recommended configurations were compared, and Table 5 summarizes the findings. For this particular instance, we set the significance level, α, at 0.05.

Table 554 illustrates that in most cases, the p-value is lower than or equivalent to the value of α/(Ai) with the exception of the FSCOA and FSGA models and FSCOA and FSACO models. It resolves that the FSCOA model is statistically noteworthy and attains a superior dossier when matched to other configurations, excluding the FSGA and FSACO models. However, there was no statistically significant variation in the performances of these models.

Table 5. Holm procedure.

Sl. No.Model used in FSz-valuep-valueα/(Ai)
1FSCOA: without FS5.3070.000010.01
2FSCOA: FSGA1.5330.060.0125
3FSCOA: FSDE1.0090.1560.0166
4FSCOA: FSPSO3.9310.0000420.025
5FSCOA: FSACO0.9560.160.05

Conclusion

This study proposed a novel FS approach called FSCOA, which applies a meta-heuristic procedure called CDO. The proposed FSCOA technique exhibits nuclear core reactor disruption to determine the best set of attributes by carefully discarding irrelevant or insignificant ones. This study investigates the impact of the proposed FSCOA approach on 12 publicly available NASA datasets by employing four widely used classifiers (DT, KNN, NB, and QDA). One elementary purpose was to correlate the predictive behavior of the proposed FSCOA approach with other existing FS techniques, namely, FSDE, FSPSO, FSACO, and FSGA. The Friedman test was used to test the statistical validity of the proposed FSCOA method. The test outcome showed that at least two models were significantly different, leading to the repudiation of the null hypothesis. This necessitates the use of the Holm test. The experimental findings suggested that the proposed FSCOA approach demonstrated higher performance when selecting the optimal set of features correlated to the studied FS procedures. However, the behavior of the proposed FSCOA approach may vary across different datasets and classifiers. In the future, we aim to expand the scope of this research by employing real-world project datasets. We also look forward to investigating the efficiency of the suggested FSCOA approach by increasing the number and variety of classifiers, especially ensemble classifiers, and employing more optimization algorithms for feature selection.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 29 Jul 2024
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Anand K, Jena AK and Das H. Implementation of Chernobyl optimization algorithm based feature selection approach to predict software defects [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:844 (https://doi.org/10.12688/f1000research.150927.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 29 Jul 2024
Views
12
Cite
Reviewer Report 20 Sep 2024
Francis Palma, University of New Brunswick Fredericton, Fredericton, New Brunswick, Canada 
Approved with Reservations
VIEWS 12
The article is sound and complete regarding technical contributions. However, some clarifications and methodological discussions are missing. Below are my more detailed comments.

1. In the Abstract, the authors said, "To overcome the above shortcomings, this study ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Palma F. Reviewer Report For: Implementation of Chernobyl optimization algorithm based feature selection approach to predict software defects [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:844 (https://doi.org/10.5256/f1000research.165541.r319618)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Dec 2024
    Kunal Anand, School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, 751024, India
    17 Dec 2024
    Author Response
    1. In the Abstract, the authors said, "To overcome the above shortcomings, this study aims to develop an innovative FS technique, namely, the Chernobyl Optimization Algorithm (FSCOA), to unwrap
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Dec 2024
    Kunal Anand, School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, 751024, India
    17 Dec 2024
    Author Response
    1. In the Abstract, the authors said, "To overcome the above shortcomings, this study aims to develop an innovative FS technique, namely, the Chernobyl Optimization Algorithm (FSCOA), to unwrap
    ... Continue reading
Views
14
Cite
Reviewer Report 17 Sep 2024
Ahmed Abdu, Northwestern Polytechnical University, Xi’an, China 
Approved with Reservations
VIEWS 14
This paper presents an approach to addressing the challenges of software defect prediction (SDP) through the innovative use of the Chornobyl Optimization Algorithm (COA) for feature selection. The problem of handling high-dimensional datasets in SDP is well-recognized, and the authors' ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Abdu A. Reviewer Report For: Implementation of Chernobyl optimization algorithm based feature selection approach to predict software defects [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:844 (https://doi.org/10.5256/f1000research.165541.r319612)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Dec 2024
    Kunal Anand, School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, 751024, India
    17 Dec 2024
    Author Response
    1. The abstract should clarify the specific contribution of the Chernobyl Optimization Algorithm (FSCOA) in comparison to existing methods. The abstract mentions drawbacks of other meta-heuristic algorithms but doesn't
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Dec 2024
    Kunal Anand, School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, 751024, India
    17 Dec 2024
    Author Response
    1. The abstract should clarify the specific contribution of the Chernobyl Optimization Algorithm (FSCOA) in comparison to existing methods. The abstract mentions drawbacks of other meta-heuristic algorithms but doesn't
    ... Continue reading
Views
18
Cite
Reviewer Report 14 Aug 2024
Shabib Aftab, Virtual University of Pakistan, Lahore, Punjab, Pakistan 
Not Approved
VIEWS 18
Abstract: The abstract should offer a succinct overview of the research paper, covering the research problem, objectives, methodology, key findings, and conclusions. It must be clear and informative, providing readers with an understanding of the study's importance. You should precisely ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Aftab S. Reviewer Report For: Implementation of Chernobyl optimization algorithm based feature selection approach to predict software defects [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:844 (https://doi.org/10.5256/f1000research.165541.r309141)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Dec 2024
    Kunal Anand, School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, 751024, India
    17 Dec 2024
    Author Response
    Abstract: The abstract should offer a succinct overview of the research paper, covering the research problem, objectives, methodology, key findings, and conclusions. It must be clear and informative, providing readers ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Dec 2024
    Kunal Anand, School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, 751024, India
    17 Dec 2024
    Author Response
    Abstract: The abstract should offer a succinct overview of the research paper, covering the research problem, objectives, methodology, key findings, and conclusions. It must be clear and informative, providing readers ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 29 Jul 2024
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.