ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Trends in the Psychometric Characteristics of NECO Mathematics Senior School Certificate Examination Over a Period of Five Years (2020-2024) among Osun State Candidates, Nigeria

[version 1; peer review: 1 approved with reservations]
Previously titled: "Trends in the Psychometric Characteristics of NECO Mathematics Senior School Certificate Examination Over a Period of Five Years (2020-2024) among Osun State’s Candidates"
PUBLISHED 27 May 2026
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

The study examined the psychometric characteristics of the National Examinations Council (NECO) Senior School Certificate Examination (SSCE) Mathematics test in Osun State, Nigeria, spanning from 2020 to 2024. A random sample comprising 10% of the total of 211,753 candidates was selected for the study. The examination item responses were used to examine three factors: item difficulty, item discrimination, and test reliability. The researchers used descriptive statistics, one-way ANOVA, and Scheffe post hoc tests to analyse the collected data. The results showed that item difficulty remained largely stable over the years, except in the most recent examination year, which exhibited a marked change. The five-year period showed major changes in item discrimination indices because item quality testing yielded different results, whereas overall item discrimination remained within acceptable limits. The KR-20 reliability coefficients were high throughout the study, indicating that the test maintained consistent internal consistency during the assessment. The study found that the NECO SSCE Mathematics examination is highly reliable but requires ongoing psychometric assessment to maintain standards across periods, including reliability, fairness, and validity.

Keywords

SSCE, NECO, Mathematics, Secondary Schools, Examination.

Introduction

Large-scale national public exams play a crucial strategic role in a country’s education system, particularly when certification, school transfers, or access to additional educational resources depend on exam outcomes. In Nigeria, the National Examinations Council (NECO) Senior School Certificate Examination (SSCE) Mathematics is considered a high-stakes assessment, used to obtain secondary school completion certificates, secure admission to tertiary institutions, and signal for the labour market. Consequently, the analysis of NECO Mathematics scores and the credibility of decision-making processes that depend on these scores across various such sessions are largely contingent on the examination’s psychometric quality. The characteristics of large-scale assessments that are common targets of evaluation are item difficulty, item discrimination, test reliability, and item bias from the psychometric perspective. Item difficulty refers to the proportion of candidates who can provide correct answers to an item. In contrast, item discrimination refers to the extent to which an item can distinguish between individuals of high and low ability. Reliability assesses the consistency of scores, whereas bias analysis examines whether differentially functioning test items are equivalent across subgroups (e.g., gender, school type, or location). All these indices collectively provide empirical evidence for the validity, fairness, and technical soundness of any examination.1,2

There has been growing concern among educational stakeholders in Nigeria over the past decade, driven by inconsistent student performance in public examinations, particularly in Mathematics. These inconsistencies could indicate disparities in instructional quality, curriculum coverage, or learner readiness, but they could also be due to inconsistencies in item quality and in the test-construction process. Empirical studies conducted in the past decade have shown that public examination items in Nigeria sometimes exhibit disparities in difficulty distribution, weak discrimination parameters, and occasional DIF, making scores from one year to the next incomparable.3,4

In contemporary research, trend analysis plays a vital role in the measurement literature, rather than merely single-year statistical analyses. The psychometric characteristics of examinations may change over time and may be necessary for assessment purposes. In the last decade, this has involved determining whether all essential characteristics of an examination remain fixed or exhibit systematic drift in difficulty, reliability, or bias. This also contains substantive discussions of high-stakes examinations such as NECO SSCE, which underscore the importance of hours in nurturing public trust and shaping government actions, particularly in international educational achievement comparisons.5,6

In Osun State, where Mathematics performance has remained a key policy concern, a systematic examination of psychometric trends provides valuable evidence for educational planning, test-development reforms, and accountability. Understanding how item characteristics have evolved from 2020 to 2024 can inform NECO’s item-writing practices, guide teacher preparation strategies, and support policymakers in interpreting examination outcomes more cautiously. Consequently, this study investigates trends in the psychometric characteristics of NECO SSCE Mathematics over five years, with a focus on candidates in Osun State, Nigeria.

The theoretical attributes of test items and test forms that provide empirically supported evidence of the quality, credibility, and fairness of measurement in educational assessments are the psychometric qualities. Fundamental to large-scale public examinations, as well as to ensuring that test scores adequately and reliably reflect the true abilities of examinees in the area of focus, is the evaluation of psychometric properties. This evaluation is a key component in the introduction, quality control, and scoring of high-stakes examinations such as national benchmark exams. One of the most widely studied psychometric indices is item difficulty, which measures the proportion of examinees who answer a given item or question correctly. A difficulty index can help explore whether test items are well aligned with the examination population and curriculum expectations. An item that is too easy or too difficult hardly contributes to good metric measurement and may distort score distributions and undermine test validity. A well-constructed public examination generally contains items applied in three different levels of difficulty-easy, moderate, and difficult- to ensure an optimal precision of measurement across the ability continuum.7,8 Item discrimination, closely related to question difficulty, is the extent to which an item differentiates between examinees with high and low background performance. Discrimination indices indicate the measurement quality of an item with respect to its reliability and overall test validity. When the discrimination coefficient for a given item is high, it provides a strong signal to the rank ordering of candidates around an ability point. However, low-discriminative items may introduce waiving-along noise and may also prevent an item’s associated ability level from being inferred.9,10 Reliability is another crucial psychometric property that denotes the consistency and stability of test scores across items, forms, or administrations. In the case of public examinations, internal consistency indices, like KR-20 or Cronbach’s alpha, are widely applied to estimate the degree to which conversation among the various test items about the target construct can be. According to,11,12 adequate reliability is a prerequisite for valid interpretation of scores, particularly for consequential decisions, such as certification and admission, that attach social consequences to an individual’s performance.

Over the past 10 years, studies have indicated that the psychometric properties of public examinations in Nigeria vary from year to year. Previous studies have analysed the NECO and WAEC Mathematics exams, noting that although the overall difficulty levels are sometimes similar across the two organizations, the discrimination indices vary significantly across test versions and years, indicating potential issues with item quality and calibration.1314 also reported similar findings in the content-specific analysis of the NECO examinations, with some items exhibiting weak discrimination and low psychometric capacity despite not being particularly difficult.

The current results highlight the importance of conducting regular psychometric evaluations and monitoring item parameters in public examinations. Tracking methodologies make it possible to detect fluctuations in psychometric quality; however, comparability of results across examinations collapses within-cohort examinations, since the general large-scale examination system that provides so much information would lose its credibility. Awareness and assessment of the psychometric characteristics over and across examination years, therefore, remains the key concern of any test producer, policy developer, or educational measurement specialist.

The National Examinations Council (NECO) was established as an alternative national examining body in Nigeria, tasked with conducting credible, valid, and reliable public examinations. Questions about the quality and comparability of NECO examinations, particularly in high-stakes subjects such as mathematics and the English language, have attracted sustained scholarly attention since their inception. As a result, a considerable body of empirical research seeks to ascertain and critique the psychometric characteristics of NECO test items under the frameworks afforded by Classical Test Theory (CTT) and Item Response Theory (IRT). Several studies conducted in the last decade have designed NECO examination items across subject areas, with a major focus on item difficulty, discrimination, dimensionality, and model–data fit. Using IRT-based approaches, researchers reported that some NECO multiple-choice test forms did not fully meet the unidimensional assumption and exhibited local item dependence and misfitting items in certain administrations. In fact,3 in their psychometric study of the NECO English Language item, found discrepancies in item parameter estimates and instances of poor item fit, pointing to the weaknesses in item calibration and pretesting procedures and raising the urgency for continuous psychometric scrutiny on NECO examinations to ensure measurement precision and construct validity.

Empirical assessments in Mathematics have identified mixed psychometric relations across years. A comparison of Mathematics items from NECO and the West African Examination Council (WAEC) indicates that the two tests exhibit similar difficulty across administrations. Still, NECO Mathematics items, unlike WAEC Mathematics items, display higher within-test discrimination variability. Thus, the variability in the extent of discrimination would raise concerns about the uniformity of measurement standards and the stability of score interpretation over time.1

Beyond item quality, research is increasingly focusing on fairness and bias in NECO examinations.15 demonstrated, through their Differential Item Functioning (DIF) analyses, that some of the mathematics items in the NECO examinations were functioning differently across subgroups defined by gender, school type (public versus private), and location (urban versus rural), while controlling for candidates’ overall ability. This differential functioning poses a threat to score equating and may systematically favour or disadvantage particular individuals or groups, thereby invalidating any decision-making based on examination results.16,17

Research comparing differential item functioning (DIF) indices frequently links deviations from uni-dimensionality to the presence of item bias. For instance, when a math exam clearly assesses both mathematical reasoning and language skills simultaneously, or when other test-taking techniques are employed, item parameters become less predictable and differences among subgroups become more apparent. Thus,18,6 opined that the very idea of fairness becomes inchoate when the issues of test validity and measurement model are not carefully considered, and myriad empirical conditions, of which substantive-parameter tuning continues to mount, are not carefully considered.

Evidence from the literature indicates that the NECO examinations are a noteworthy asset to national assessment and certification; however, psychometric challenges persist. The evidence demands routine item analysis, longitudinal monitoring of item parameters, and thoroughgoing bias validation to be integrated as cardinal components of NECO quality assurance activities. It is imperative that these issues be addressed up front, particularly in mathematics examinations, where problems of psychometric quality can distort the interpretation of students’ competence and may yield ill-informed decisions regarding educational policies.

Research objectives

The main goal of this study is to examine how the psychometric properties of Mathematics in the NECO SSCE have evolved from 2020 to 2024 for candidates in Osun State, utilizing Classical Test Theory as the analytical framework. The specific objectives of the study are to:

  • i. Evaluate trends in item difficulty indices of NECO SSCE Mathematics multiple-choice items from 2020 to 2024 among candidates in Osun State.

  • ii. Review the trends of item discrimination indices of NECO SSCE Mathematics items across these five examination terms.

  • iii. Evaluate the reliability of NECO SSCE Mathematics tests across the tests given in the five years.

  • iv. Compare yearly variations in psychometric characteristics (item difficulty, item discrimination, distractor efficiency, and reliability) of NECO SSCE Mathematics examinations from 2020 to 2024.

Research questions

  • i. What is the trend observed in the difficulty indices of NECO SSCE Mathematics multiple-choice items from 2020 to 2024 among Osun State candidates?

  • ii. How do the NECO SSCE Mathematics items’ discrimination indices vary across the five examination years (2020–2024)?

  • iii. What is the extent to which the reliability coefficients of the NECO SSCE Mathematics examinations remain consistent across the years 2020 to 2024?

Hypotheses

  • i. The difference in item difficulty of NECO SSCE Mathematics examinations between 2020 and 2024 varies significantly.

  • ii. The item discrimination of NECO SSCE Mathematics examinations between 2020 and 2024 did not vary significantly.

Methodology

The study employed a descriptive quantitative design, using NECO Mathematics examinations from 2020 to 2024 as the test data and student responses from mathematics candidates in Osun State schools. The study population comprised all individuals from Osun State who enrolled in and participated in the NECO SSCE Mathematics examination between 2020 and 2024. Data from the examination board indicate that 66,256 candidates registered in 2020, followed by 34,434 in 2021, 34,682 in 2022, 35,118 in 2023, and 41,263 in 2024, for a total of 211,753 candidates over the five years. These individuals came from public and private institutions and represented a range of ability levels and learning settings in the state of Osun. A representative sample needed for an in-depth psychometric evaluation was selected, taking into account the large population and the long-term aspect of the research. Proportional random sampling was performed, drawing a 10% portion of the total population; 21,175 individuals were sampled. The sampling method was designed to ensure that each test year was accurately represented in the study sample, in proportion to its prevalence in the overall population. As a result, the study maintained the population’s characteristics to facilitate comparison and enabled more robust trend comparisons. Using a proportional allocation, the target totals for each year are set at 6626 for 2020, 3443 for 2021, 3468 for 2022, 3512 for 2023, and 4126 for 2024. Randomly selecting samples from exam records in each year ensured that every candidate had an equal, independent probability of inclusion in the study. The primary instrument used for data collection in this study was an electronic spreadsheet of OMR data containing candidates’ item-level responses for the years 2020 to 2024. This study carefully and systematically analysed data from 2020 to 2024 to achieve the research goals and ensure a thorough evaluation of the psychometric properties of the NECO SSCE Mathematics examination. The Classical Test Theory (CTT) model was employed to provide robust evidence regarding the quality of the test items and overall assessment.

Results

Research Question 1: What is the trend observed in the difficulty indices of NECO SSCE Mathematics multiple-choice items from 2020 to 2024 among Osun State candidates?

The proportion of candidates who answered each item correctly was used to calculate the item’s difficulty index (p-value) for each examination year. The mean value obtained across all items for each year was computed. The results are presented in Table 1.

Table 1. NECO mathematics average difficulty Indices trend (2020–2024).

YearMean item difficulty (p̄) Interpretation
20200.75Very Easy
20210.70Moderately Easy
20220.74Moderately Easy
20230.80Very Easy
20240.65Moderately Easy

Table 1 presents the average item difficulty indices for the NECO Senior School Certificate Examination (SSCE) Mathematics test from 2020 to 2024. The mean item difficulty index (p¯) indicates the percentage of test takers who answered items correctly; higher values indicate easier items, whereas lower values indicate harder items. According to the results, the 2020 NECO Mathematics exam exhibited the greatest mean difficulty index (p¯ = 0.83), implying that the entire set of questions was accessible and straightforward for Osun State students. The data demonstrate that most candidates from that year were successful in answering most test questions. The mean difficulty index for 2021 was 0.70, indicating a test of moderate difficulty, even though the assessment items maintained their appropriate range for large-scale testing. In 2022, item difficulty increased slightly to 0.74, indicating that assessment items from that year were easier to solve than those from 2021. The 2023 examination recorded a further increase in difficulty index to 0.80, indicating that the mathematics items were again largely easy for candidates. The 2024 examination showed a substantial decrease, with a mean difficulty index of 0.65, indicating that this assessment required greater effort from students than in previous years. However, it remained at moderate difficulty levels. The five-year period shows a nonlinear progression, with alternating patterns of test difficulty across examination years. The test forms maintained a consistent level of difficulty, yet their average-difficulty assessments showed irregularities, resulting in testing problems that needed to be resolved across different assessment periods. Score fluctuations between different years require systematic item pretesting and difficulty-balancing methods to establish consistent standards for the NECO Mathematics exams.

Research Question 2: How do the NECO SSCE Mathematics items’ discrimination indices vary across the five examination years (2020–2024)?

Table 2 displays the progression of the average item discrimination indices for the NECO SSCE Mathematics exam from 2020 to 2024. The mean discrimination index, calculated using the point-biserial correlation coefficient (rpbis), reflects the extent to which test items distinguish between top-performing and lower-performing students. Higher discrimination values indicate that test items are of higher quality, which helps to correctly rank candidates. The 2020 and 2021 exams produced identical mean discrimination indices of 0.27, indicating moderate tracking ability. The test items from these two years successfully differentiated between candidates with higher and lower abilities, although their effectiveness fell short of the 0.30 benchmark, which defines highly effective test items. The 2022 mean discrimination index rose to 0.29, which indicated a small improvement in item discrimination that approached the standard for high-quality multiple-choice items. The mean discrimination index decreased to 0.25 in 2023, indicating reduced discriminatory power compared with earlier years. The 2023 examination showed reduced effectiveness because a greater number of test items failed to distinguish between high-and low-performing students. The 2024 examination showed a substantial increase in the mean discrimination index, which reached 0.36 and indicated good to very good discrimination power. The 2024 test items were more effective than in previous years at differentiating candidates by ability level. The item discrimination indices exhibit trend patterns over their five-year span, oscillating rather than showing regular development; the data indicate improvements in 2024 after moderate discrimination in previous years. The 2024 data show a significant rise, indicating that either item construction standards improved or items were better evaluated for candidates’ actual skill levels. The annual fluctuations observed by researchers indicate that NECO Mathematics examinations require regular item evaluation and quality assurance procedures to maintain consistent examination performance across years.

Table 2. NECO mathematics average discrimination indices trend (2020–2024).

YearMean discrimination (rpbis) Interpretation
20200.27Moderate Index
20210.27Moderate index
20220.29Good
20230.25Weak index
20240.36Very Good Index

Research Question 3: What is the extent to which the reliability coefficients of the NECO SSCE Mathematics examinations remain consistent across the years 2020 to 2024?

Table 3 presents the KR-20 statistics for the test batteries of the NECO Senior School Certificate Examination (SSCE) Mathematics over five successive years of administration, from 2020 to 2024. Results for the official outcome indicate that reliability was maintained for the NECO Mathematics exams throughout the five years. More succinctly, KR-20 coefficients were similarly high across all five years: 0.90 in 2020, 0.88 in 2021, 0.87 in 2022, 0.89 in 2023, and 0.90 in 2024. In all cases, values exceeded the minimum acceptable reliability of 0.70, and a majority exceeded 0.90, suggesting high internal consistency. Any mild undulations observed over the year were largely unaddressed, remaining below the required limit, indicating generally consistent, homogeneous functions across items. With KR-20 recovery to.90 in 2024, lying high in its range similar to 2024, our inference regarding the consistency of test construction and the administration of quality assurance processes was justified for NECO by fortuitous excellence.

Hypothesis 1:

The difference in item difficulty of NECO SSCE Mathematics examinations between 2020 and 2024 varies significantly.

Table 3. NECO mathematics reliability coefficients (kr-20) trend (2020–2024).

Year KR-20 Reliability
20200.90
20210.88
20220.87
20230.89
20240.90

The item difficulty indices for NECO SSCE Mathematics items from 2020 to 2024 are presented in Table 4.

Table 4. Descriptive statistics of item difficulty of NECO SSCE mathematics between 2020–2024.

YearN x¯ SDMin Max
202060.75.22773.001.00
202160.70.23711.00.91
202260.74.14051.09.88
202360.80.16507.04.94
202560.65.19615.17.90
Total300.73.20221.001.00

The study uses 60 multiple-choice items each academic year, yielding 300 items across the five testing years. The item difficulty indices range from 0.00 to 1.00 over the five years, with higher values indicating easier assessment materials. The average item difficulty across five years was 0.73 (SD = 0.20), indicating that the NECO SSCE Mathematics assessment materials were of moderate difficulty for students. The data indicate that a considerable number of students answered most test questions correctly during the period examined. The difference in item difficulty over the five years was then assessed using a One-Way Analysis of variance. The result is presented in Table 5.

Table 5. One-Way ANOVA showing the difference in item difficulty of NECO SSCE mathematics between 2020–2024.

Sum of SquaresDfMean SquareF Sig.
Between Groups.8064.2025.206.000
Within Groups11.419295.039
Total12.226299

Table 5 presents the results of a one-way Analysis of Variance (ANOVA), which tested whether the NECO SSCE Mathematics item difficulty means differed significantly across examination years from 2020 to 2024. The computed F ratio (F(4, 7,749) = 5.206) was statistically significant at the. 05 level implies that the average item difficulty indices across the five years of examination were statistically significant. In other words, the difficulty levels of NECO SSCE Mathematics test items were not constant in the set range of years of 2020 to 2024, or at least one year’s average item difficulty was statistically significantly different from that of the others. Thus, a Scheffe test was conducted to determine where the difference lies. The results are presented in Table 6.

Table 6. Scheffe Multiple comparison of item difficulty of NECO SSCE mathematics between 2020–2024.

(I) Neco Item Difficulty Indices(J) Neco Item Difficulty IndicesMean Difference (I-J)Std. Error Sig.
20202021.05799.03592.626
2022.01342.03592.998
2023−.05094.03592.734
2025.10131.03592.096
20212020−.05799.03592.626
2022−.04457.03592.819
2023−.10893.03592.059
2025.04332.03592.834
20222020−.01342.03592.998
2021.04457.03592.819
2023−.06436.03592.524
2025.08789.03592.203
20232020.05094.03592.734
2021.10893.03592.059
2022.06436.03592.524
2025.15225*.03592.002
20252020−.10131.03592.096
2021−.04332.03592.834
2022−.08789.03592.203
2023−.15225*.03592.002

Table 6 presents the Scheffe post hoc analysis of item difficulty indices for the mathematics examination in the NECO SSCE for the years 2020–2024. The results indicate that, in the vast majority of pairwise comparisons across the exam years, p-values did not reach the 0.05 significance level. This indicates a general category of item-difficulty consistency over those years. However, a statistically significant difference was observed between the 2023 and 2024 examinations: the mean difference in item difficulty between the two years was estimated at 0.15225 (p = 0.002). This indicates a significant difference in item difficulty between the two years. A positive mean difference indicates that the items of 2023 were relatively easier than the 2024 items (or, otherwise, the 2024 items were harder than the 2023 items).

Hypothesis 2:

The item discrimination of NECO SSCE Mathematics examinations between 2020 and 2024 did not vary significantly.

Table 7 presents descriptive statistics of items’ discrimination indices for the NECO SSCE Mathematics for each of the five years from 2020 to 2024. Each year’s examination comprised 60 multiple-choice items, for a total of 300 items analyzed over the five years. The average item discrimination indices ranged from 0.2546 (2023) to 0.3559 (2024). Specifically, 2020 and 2021 had average item discrimination values of 0.2723 and 0.2744, respectively, indicating moderate discrimination, whereas 2022 had a mean item discrimination of 0.2861, indicating a slight improvement in discrimination quality. In 2023, with an average discrimination of 0.2546, items exhibited the lowest differentiation, indicating weaker differentiation power that year. The highest mean discrimination value across all years was 2024, with an average of 0.3559, indicating a substantial improvement in item quality and in items’ ability to differentiate among candidates of varying ability levels.

Table 7. Descriptive statistics of item discrimination of NECO SSCE mathematics between 2020–2024.

YearN x¯ SDMin Max
202060.2723.10728−.05.40
202160.2744.10239−.05.42
202260.2861.07985.01.42
202360.2546.06861−.06.36
202560.3559.22995−.05.74
Total300.2887.13489−.06.74

According to Table 8, the one-way ANOVA results indicate the possible presence of significant differences in the average discrimination indices for NECO SSCE Mathematics examinations taken between 2020 and 2024. The F-test yielded a significant F-statistic (F = 5.375; p < 0.05). A significant ANOVA result indicates that item discrimination quality differs across at least one year.

Table 8. One-Way ANOVA showing the difference in Item discrimination indices of NECO SSCE mathematics between 2020–2024.

Sum of squaresdfMean squareFSig.
Between Groups.3704.0925.375.000
Within Groups5.071295.017
Total5.441299

Following the detection of a statistically significant source, a Scheffe pairwise comparison was conducted to identify the specific years with notable differences. The results are presented in Table 9.

Table 9. Scheffe multiple comparison of item difficulty of NECO SSCE mathematics between 2020–2024.

(I) Neco item difficulty indic(J) Neco Item difficulty indicesMean difference (I-J)Std. Error Sig.
20202021−.00202.023941.000
2022−.01380.02394.988
2023.01772.02394.968
2025−.08358*.02394.017
20212020.00202.023941.000
2022−.01178.02394.993
2023.01975.02394.954
2025−.08156*.02394.022
20222020.01380.02394.988
2021.01178.02394.993
2023.03152.02394.784
2025−.06978.02394.078
20232020−.01772.02394.968
2021−.01975.02394.954
2022−.03152.02394.784
2025−.10130*.02394.002
20252020.08358*.02394.017
2021.08156*.02394.022
2022.06978.02394.078
2023.10130*.02394.002

Table 9 shows that the year-to-year pairwise analyses yielded few significant results, as indicated by p-values >0.05. Specifically, comparisons between 2020 and 2021, 2020 and 2022, 2022 and 2023, 2021 and 2022, 2021 and 2023, and 2022 and 2023 revealed no significant differences, indicating that test difficulty was stable over time. However, significant differences were observed on the other hand within the 2024 examination year and most of the previous years: significant differences were observed between 2020 and 2024 (Mean Difference = −0.08358, p = .017), 2021 and 2024 (Mean Difference = −0.08156, p = .022), and 2023 and 2024 (Mean Difference = −0.10130, p = .002). This indicated that items in the 2024 examination were significantly more difficult than those in 2020, 2021, and 2023, as indicated by negative mean differences when each year was contrasted with 2024.

Discussion

The research evaluated changes in psychometric properties of the NECO Senior School Certificate Examination SSCE Mathematics during the five-year period from 2020 to 2024, which tested students from Osun State. The study evaluated changes in psychometric properties of the NECO Senior School Certificate Examination SSCE Mathematics during the five-year period from 2020 to 2024, which tested students from Osun State.

The item difficulty analysis showed that NECO SSCE Mathematics questions had an average difficulty which remained within acceptable CTT limits (p ≈ 0.30–0.80). The descriptive results showed that item difficulty remained stable between 2020 and 2023 until it experienced a significant change in 2024, which ANOVA and Scheffé post-hoc analysis confirmed. The NECO examination established a standard difficulty level which it maintained throughout most years, but the 2024 test showed a clear break from this established pattern. Variations in item difficulty across years of testing are common in large-scale assessments because they reflect changes in educational programs, test developers’ understanding of learning objectives, and the determination of assessment standards, according to.19 The significant difference involving the 2024 items implies a possible recalibration of examination standards, which, while not inherently problematic, underscores the importance of systematic equating and longitudinal monitoring to ensure comparability of scores across years.20,21

The study found notable variations in item discrimination outcomes when assessed over five separate testing intervals. The NECO Mathematics items achieved good performance in student ability testing because their mean discrimination indices reached acceptable limits which extended to value 0.20. The results showed year-to-year variation in testing results which reached their peak in 2024 when the highest average discrimination value was achieved. The evidence from recent years shows that testing organizations now give better priority to item testing standards which leads to improved assessment results through better testing writing and testing assessment and testing review methods. Assessments for high-stakes testing require high discrimination indices because those indices improve test score interpretation.9 The assessment results show negative discrimination values across multiple years because approximately 10 percent of assessment items did not perform properly because of test item confusion and mistaken answer keys and test item content that did not match test objectives. Developing assessment systems need to conduct regular post-examination item evaluation because previous studies have shown similar results in their research on public examination systems.22,14

The KR-20 reliability coefficients obtained for the five examination years ranged from 0.87 to 0.90. This range of reliability coefficients confirmed that all test administrations achieved high internal consistency. Manual comparison of the coefficients revealed only minimal year-to-year differences which remained below 0.02. The overall coefficient range between two years was 0.03. The NECO SSCE Mathematics examination maintained consistent measurement accuracy throughout its testing period because these variations stayed within psychometrically nonessential limits. The NECO SSCE Mathematics examination maintained consistent measurement accuracy throughout its testing period because these variations stayed within psychometrically nonessential limits. The test construction practices and test length requirements together with item assessment of the core construct for the test demonstrate reliable assessment through their high reliability coefficients. The study showed stable results which matched the expected standards for large-scale assessments because assessment reliability should remain stable across different testing conditions.

Implications for examination quality assurance

The research results demonstrate that NECO SSCE Mathematics exam has maintained its strong reliability throughout testing while its testing materials show acceptable quality standards. The test development process demonstrates its dynamic nature through ongoing need for psychometric assessments which experts should conduct to maintain valid results in high-stakes certification and selection exams.19,21 recommend that regular item analysis, alongside structured feedback loops for item writers and moderators, would help sustain improvements in discrimination quality while ensuring that changes in difficulty do not compromise fairness or comparability across cohorts. Such practices are critical for strengthening public confidence in examination outcomes and supporting evidence-based assessment reforms in Nigeria.

Conclusion

The study investigated the psychometric evolution of the NECO Senior School Certificate Examination (SSCE) Mathematics test over a five-year period starting from 2020 to 2024 for candidates in Osun State through the application of Classical Test Theory. The research results demonstrate that examination items in high-stakes public assessments maintain consistent quality while exhibiting different levels of performance. The study concludes that NECO SSCE Mathematics test demonstrates strong psychometric properties which show particular excellence in testing reliability. The examination requires continuous systematized monitoring of item difficulty and discrimination assessment which will enable fair testing and consistent evaluation of academic performance across different years.

Recommendations

Based on the findings of this study, the following recommendations are made:

  • i. Routine Post-Examination Item Analysis: NECO should institutionalize comprehensive post-examination item analysis after each examination cycle to identify poorly functioning items, particularly those with low or negative discrimination indices, for revision or elimination.

  • ii. Strengthening Item Writer Training: Regular capacity-building workshops should be organized for item writers and moderators, with emphasis on writing items that achieve optimal difficulty and high discrimination in line with Classical Test Theory guidelines.

  • iii. Monitoring Longitudinal Item Trends: NECO should adopt a structured framework for monitoring longitudinal trends in psychometric indices to ensure consistency of examination standards across years and prevent unintended shifts in difficulty.

  • iv. Use of Statistical Evidence in Test Review: Decisions regarding item retention, modification, or replacement should be guided by empirical psychometric evidence rather than solely by expert judgment.

  • v. Expansion to Advanced Psychometric Models: Future evaluations of NECO examinations should complement Classical Test Theory with Item Response Theory analyses to provide deeper insights into item functioning and candidate ability estimation.

  • vi. Policy Support for Examination Quality Assurance: Educational policymakers should support the integration of psychometric research findings into national examination quality assurance policies to enhance public confidence in examination results.

Ethical approval

Ethical approval was not required for this study as it involved secondary analysis of anonymized examination data with no direct involvement of human participants.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 27 May 2026
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Alaba Adeyemi A, Beatrice Oluwakemi B and Odunayo Ibukun O. Trends in the Psychometric Characteristics of NECO Mathematics Senior School Certificate Examination Over a Period of Five Years (2020-2024) among Osun State Candidates, Nigeria [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:818 (https://doi.org/10.12688/f1000research.182391.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 27 May 2026
Views
6
Cite
Reviewer Report 03 Jun 2026
Omale Onuh, Joseph Sarwuan Tarka University, Makurdi, Nigeria 
Approved with Reservations
VIEWS 6
Despite these strengths, the manuscript requires substantial revision before it can be considered suitable for indexing. Several methodological, statistical, conceptual, and presentation issues need to be addressed. First, there is a mismatch between the stated objectives and the analyses conducted. ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Onuh O. Reviewer Report For: Trends in the Psychometric Characteristics of NECO Mathematics Senior School Certificate Examination Over a Period of Five Years (2020-2024) among Osun State Candidates, Nigeria [version 1; peer review: 1 approved with reservations]. F1000Research 2026, 15:818 (https://doi.org/10.5256/f1000research.201328.r489957)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 27 May 2026
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.