Keywords
sample size, rapid review, study design, educational research
This article is included in the Datta Meghe Institute of Higher Education and Research collection.
By conducting an in-depth study of the publications, a review was conducted with the goal of evaluating the sample size in educational research. The sample size, represented by the letter “n,” is a key factor in this research because it specifies the number of participants who represent the target population. Although various studies have been published in the literature defining the processes for calculating sample sizes, there is still much uncertainty. It is vital to understand that there is no single all-encompassing method for determining sample sizes for different study designs. Instead, different study designs call for different approaches to determine sample numbers.
Information was retrieved from the databases in accordance with updated PRISMA recommendations. The keywords used for the retrieval of the relevant articles from two databases (Google Scholar and PubMed). The articles were selected by thorough scrutiny and application of inclusion and exclusion criteria.
Seven articles were selected from the 9282 articles. The comparison was made among the studies in the relation to methods, objective, and outcome from the enrolled studies.
The evaluation of the seven studies as a whole concluded that the sample size for testing any novel approach essentially required 24.24 participants in each group. The median sample size for the simulation-based educational research was 30. Further research is required to determine the proper sample size based on a single universal formula for all types of designs.
sample size, rapid review, study design, educational research
I have made corrections suggested by the reviewer related to the title, objectives, explanation,elated to the search strategy, clearly stating the research question and insights.
See the authors' detailed response to the review by Jorge M. Mendes
The term “sample size” describes the number of subjects or observations that make up a study ‘n’ is typically used to represent this number. The size of a sample affects two statistical properties:1) the accuracy of estimates and 2) the study's ability to draw inferences.1
Surveys, experiments, observational studies, and other types of clinical research studies can all be categorized. Many different factors are involved in excellent research planning. The first step is to define the practical issue. Choosing the relevant participants and controls, as well as the experimental or observational units, was the second stage.
The inclusion and exclusion criteria must be carefully defined and should account for any potential variables that could affect the measurements and units being observed. The study design must be precise, and the procedures must follow the best technique currently available. Based on these considerations, the study's sample size needs to be appropriate for its goals and potential variability. The sample must be “large enough” for the effect to be statistically significant and have the expected size of scientific significance. At the same time, it is crucial that the study sample not be “too big,” where a statistically significant effect of minor scientific import could still be found.2 Additionally, the sample size was economically significant. Resources may be wasted in an insufficient study because it may not yield valuable results, whereas an excessively large study consumes more resources than is required. The sample size of a study involving human or animal subjects is a crucial ethical concern, because a poorly planned experiment exposes participants to potentially hazardous procedures without contributing to information.3,4 Therefore, calculating the power and sample size is crucial in the design of clinical research. Unaccountable studies printed in national and intercontinental journals have found that sample size estimation were incorrectly disclosed or had smaller samples than necessary, which reduced their power.1,2
There is still much confusion despite the fact that unnumbered studies clarifying the methods of sample size computation have been published in the existing literature. It is crucial to realize that there is no single universal formula for calculating the sample sizes for all study designs. Instead, different study designs require different methods to calculate sample sizes.3,4 The study was conducted with the objective is to provide insights, address the challenges, and offer recommendation regarding sample size determination in educational research.
To conduct this rapid review, Preferred Reporting Items for Systematic review and meta-analysis (PRISMA-S) criteria were used to guide the search.5 The search strategy was aimed to identify relevant articles addressing sample size determination in educational research. Researches searched the following databases such as PubMed, Embase, and the Cochrane Library, additionally; hand searching of reference lists and citation tracking were also conducted to identify additional relevant articles. The search term or keywords for accessing the data from the databases related to sample size (e.g. sample size calculation, sample size determination) and educational research (educational studies, pedagogic studies) were utilized. To perform search effectively and refine search results, the search strings were constructed using BOOLEAN operator (AND, OR). Free full-text, unlocked articles, pertinent terminology and information, and English language usage were considered as inclusion gauge. The exclusion criterion was contemplated and they excluded abstracts, locked articles and journals, no relevance in the data, and languages other than English. Two authors screened the titles and data of the extracted articles to determine the eligibility of the included articles. Data from the extracted articles were documented into standardized format in Microsoft excel sheet. The included articles were assessed by using appropriate criteria such as methodological rigor, relevance to the research question, and transparency of reporting. In the end all the data were synthesized for common themes, relevancy, challenges, and recommendation related to the sample size assessment. The principal investigator carried out the entire review planning process, which was authorized by the other authors. The presentation of the entire search is shown in Figure 1.
Seven studies were selected from the 9282 articles by using Google Scholar and PubMed with the applicable of stringent inclusion and exclusion criteria. All the information related to the articles were shown in the Table 1.
Number of article | Author name | Article type | Objectives | Method | Conclusion |
---|---|---|---|---|---|
1 | McConnell et al.6 | Editorial | The purpose of this editorial was to discuss sample size calculation in context of medical research intervention. | To teach nursing and anaesthetic colleagues about programmed intermittent epidural bolus analgesia, the author created a scenario in which they planned to accomplish their goal of estimating the required sample size. To this end, they developed a questionnaire and weekly tests to evaluate their coworkers' understanding of the novel method and efficacy of the intervention. | The formula produced n = 24.24, or 25 in each group, for a total sample size of 50 students, as per the statement. It is extremely important to use effect size when estimating the sample size. |
2 | Staffa et al.7 | Review | The purpose of the study, which was conducted by paediatric surgeons, was to disseminate a method for selecting a sample size to identify an effect that would have therapeutic significance through the interpretation and validation of the findings. | Using various instances, the authors used a five-step technique to validate the sample size and statistical power analyses, including defining the primary outcome of interest and the expected impact size and power. Identify the relevant statistics and statistical test that will be taken into account. Conducted the necessary calculations to acquire the sample size needed using software or a reference table, Make a formal power and sample size declaration for the publication, grant application, or project proposal. | Calculating the suitable statistical test to employ for sample size depends on the type of the data, clinical hypothesis, and its applications. |
3 | Dreyhaupt et al.8 | Review | The study was performed to describe the implementation and general principles of cluster randomization, and also for outlining the general aspects of using cluster randomization in prospective two arm comparative -educational research. | The study compared the individual randomization with the cluster randomization technique in educational research to evaluate the systematic bias reduction. It also demonstrated the general principles, its implementation and aspect of cluster randomization in a prospective two arm study. | The studies that involve cluster randomization required relevantly bigger sample size and complex method for calculations. |
4 | Cook et al.9 | Systematic review | The study was conducted to determine the study power across a range of effect sizes, by re-analysing meta-analysis of simulation based education. | The author re-analysed 897 studies and the results of simulation based education to determine study power across a range of effect size. | The median sample size for the 627 no-intervention comparison group was found as 25, whereas the median sample size for different simulation group was found as 30. |
5 | Agnihotram 201810 | Review | This article focuses on the determination of the minimal sample size for a variety of objectives, providing a quick overview of the statistical methods employed in various research study phases. | The author discussed the various steps for estimating the sample size, that included
| The study found that the sample size formula was based on the primary research purpose, conclusions, variables, statistical analysis planned, number of groups, and sampling technique. |
6 | Ferreira et al.11 | Review | By using objective methodologies as the standard, the study intended to validate a priori hypothesis and sample size for evaluating the intensity and duration of physical activity in a paediatric population. | The data from the electronic databases were searched, physical activity intensity was measured by questionnaire and duration was measured by accelerometer. | The study indicated weak to moderate agreement between subjective and objective approaches for determining the intensity and duration of physical activity. Additionally, assessments of the stability of method-to-method agreement were provided by sample sizes of 50 to 99 subjects. |
7 | Guo et al.12 | Review | The goal of the study was to determine the sample size for two independent groups with equal and unequal unknown variances when power and differential cost were both taken into account. | In this study, Welch approximate test applied to test derive various sample size allocation ratios by minimizing the total cost or equivalently, maximizing statistical power and two types of hypothesis were used superiorly and equivalence of two means for sample size planning. | The sample size formula proposed in this study should be used whenever cost factor is involved and population variances are unknown and unequal. |
Research in health science education is expanding. Emerging educational research relies on relevant conceptual frameworks, reliable research techniques, and important discoveries.13,14 Prior reviews have shown that many educational research articles employ small sample sizes, despite the fact that researchers rarely take into account the expected impact size, intend the sample size before, or describe the actual precision in evaluating the results.15,16 Although authors rarely analyse the anticipated influence size, arrange the sample group in detail, or analyse the results from the perspective of actual precision.9
According to the definition of statistical power, it is “the likelihood that the null hypothesis will be rejected in the sample if the observed effect in the population is equal to the effect size”.17 In other words, the potential that a study will uncover a real, statistically significant effect is known as power. Studies with a higher power are preferable because lower-power studies may miss potentially important connections. A power of 90% is ideal and 80% is typically considered the minimum power. The sample size (the number of observations), the effect size (the value of the effect), and the risk of type I errors all affect power (the likelihood of recognising a “significant” difference when there is none, represented by alpha).9,18
The study adopted a convenience sampling method for primary research for determination of sample size in education by examination of simulation-based education. First, most research in the sample only had the power to find effects with moderate to large SMD [0.8], while other studies only had the power to find effects with immensely large magnitudes ([2 standard deviations). Most of the negative studies, or those that did not find a statistically relevant difference, had very broad confidence intervals (CI), signifying the probability of large and likely important differences. The first point and discovery were connected. In these trials, the lack of a statistically relevant outcome did not establish superiority or equivalence of the interventions under study.9
In one study, the author aimed to present sample size calculations in the context of medical educational interventions and focused on computing sample sizes to compare distinct groups where the result was a continuous (interval or ratio) dependent variable, such as in interventional designs. The criteria for forecasting the sample group, such as the relevance factor, preferred statistical significance, predicted difference in score, and approximate evaluation variation, which may be estimated from previous studies, were discussed in order to determine the number of participants required to assess the effects of an intervention on a specific outcome or the association between variables.6,19 Interventions in education frequently concentrate on changing latent conceptions, which are theoretical and cannot be readily seen or quantified. This causes the validated scales to vary, changing how the outcome measures are calculated. The educational researcher advocated the use of effect size in determining the sample size. The study design often affects the relationship between larger effect sizes and smaller sample sizes. This resulted in the effect sizes being categorized as “small,” “medium,” and “large,” respectively, for values of 0.20, 0.50, and 0.80.20 Finally, the meta-analysis revealed that the sample size for each group was 24.24 respectively.21,22
Further, author’s also discussed errors to avoid, including considering sample size estimation as small, medium, or large, which leads to a failure in the accuracy of the evaluation tool and sample characteristics.23 Second, unless necessary, researchers should avoid creating new institute-specific assessment instruments. This is because they must be validated for accuracy and reliability before use in interventional studies.24 Third, the prospective dropout and attrition rates must be considered.25Finally, the need to avoid equating the effect size with its true significance and employing a confidence interval that offered accuracy in the sample and effect sizes.6
The objective of the report was to establish the optimal number of subjects for a study during the planning stage, with sufficient patients to resolve the most clinically important questions and statistical power calculations. The evaluation of the sample size that must be randomised to each arm in order to achieve the standard 80% or 90% power to find a clinically meaningful effect in randomised controlled trials, which frequently use parallel group designs. The need for a control arm, statistical comparability, structural equality, and resemblance of management conditions and observations are among the themes that the author elaborated on as being essential for educational research investigations. If an academic research study exhibits these traits, hence the test arm's success is significantly greater than that of the control arm, and the distinctness cannot be the result of concurrence. The cluster randomization was usually performed for non-therapeutic intervention such as prevention program, healthcare program and training programs. Two to thousands of individuals were found in each cluster. Education research may also consider different cluster sizes.8
Minimizing or reducing contamination bias is the fundamental reason for performing cluster-randomized studies. Observations inside clusters are typically more comparable to one another than observations from distinct clusters, creating a unique data structure known as a statistical dependency. The effective sample size of a cluster-randomized study is less than the concrete sample size, which has an impact on sample size computation (i.e., the number of enrolled students). Consequently, it is inappropriate to use typical methods that presume the statistical competence of all observations to rule out the sample size for cluster-randomized investigations.8
The purpose of the study, which was conducted by paediatric surgeons, was to disseminate a method for selecting a sample size to identify an effect that would have therapeutic significance through the interpretation and validation of the findings. Using a five-step approach, it is possible to calculate the minimum sample size necessary to ensure sufficient power and accurate interpretation of the study's findings.7 The sample size that can be achieved to assess a significant effect on the basis of research or primary data must be justified using the power calculation. The research sample size should have adequate statistical power to identify clinically meaningful effects in scientific investigations.7,26 The sample size of the prior control group determined the statistical power. To compare the two groups effectively, comparisons must be made with a historical control group that is comparable to the research group, for which data on assessed confounders are available. The suggested 5-step approach can be used with any type of data or study design, although power and sample size primers do not provide examples for every possible research circumstance. The fundamental objective of the primers was to compare the two treatment groups. However, due to multiplicity and multiple testing, there is a higher risk of false-positive results (Type I error) when comparing more than two groups.27,28
Guo et al. used two different types of hypotheses, taking into account sample size planning factors such superiority/non-inferiority and equivalence of two means. When population variances are unknown, no exact sample can be found through traditional sample size formula and resulting sample size must be suitable enough to meet the required level of significance and probability of correct decision and power. The cost constraint depends on the two experimental goals for given level of αand power 1-β i.e. allocation of having minimal total cost and ratios are a function of unit cost ration and standard deviations.12
Historically, three methods have been employed to determine the sample size. The first is an interval strategy, where the confidence interval is high (e.g., 95%) and the sampling error between the true parameter and its estimate is kept to a present modest amount, that is, 3 percent. Since there is no hypothesis testing involved in this method, no threshold of significance is required. The second is a hypothesis-related approach in which both the null and alternative hypotheses must be precisely specified beforehand to detect a significant difference between the parameters under study while simultaneously meeting the required level of significance (Type I error rate) and the desired power (probability of correctly accepting the specified alternative). The third strategy uses a “indifference zone,” where populations that perform better than the others are placed in a zone where they are more likely to be chosen correctly.29
A “priori" literally translates from Latin as “what comes before” and they are a fundamental part of the scientific method since they are created based on assumptions.30 From these assumptions, three hypotheses were inferred. With reference to objective methodologies, the aim of this systematic review was to offer proof for a priori hypotheses and sample size for evaluating the quantity and duration of physical activity in a pediatric population. The results of systematic review suggest that the degree of agreement between subjective and objective measures for determining the intensity and duration of physical exercise should be assumed modest to moderate.11
Currently, there are no data to support an a priori assumption regarding how well the different methods of assessment agree. To select a sample size, attain precision, or have sufficient power to reject a false null hypothesis, a robust a priori hypothesis is necessary. Cost and feasibility, which are frequently the true drivers of the sample size, cannot be disregarded by researchers. Nonetheless, typical power calculations yield only specific sample sizes by making precise assumptions. This study’s results indicate that, for assessing nearly all physical activity, intensity and duration parameters, a sample size of 50–90 subjects offers constant agreement between subjective and objective approaches. The degree of uniformity displayed in each (often non-representative) sample studied, the accuracy of the subjective method created for a target sample, and the inadequacy of the correlation coefficient for detecting agreement issues are all potential explanations for stable agreement in this sample size interval. Additionally, studies with small samples showed higher levels of variability in the range of findings, perhaps as a result of the inferior design of these studies to those with larger samples.31
The “vibration of effects” diminishes the reliability of the consensus measures in samples with less than 50 respondents. The study predicts that the basis for the decreased reliability of the agreement measures in studies with samples of 100 or more persons is that the researchers' attempts to ignore the occurrence of an exaggerated effect in a small-sample trial when a finding is made is the primary factor.32 The superiority in methodology systematic evaluations addressing the agreement between subjective and objective measures for assessing physical activity has frequently found low methodological quality in the studies.32–35
The COSMIN checklist, which was employed in the cited study, identified the absence of an a priori hypothesis and small sample size (n = 50) as the primary factors affecting the methodological standard of the redeemed studies. These factors were obeyed by a lack of data regarding subjects who were missing and the way in which missing data were handled. The author disapproved with questionnaires, diaries, and/or logs those received low ratings in methodological quality evaluations are ineffective tools for gathering subjective data.32
The sample size was depended upon the degree of heterogeneity, if the analysis was performed by multiple investigators and teams. Moreover, studies with limited data showed higher levels of variability in the range of findings, perhaps as a result of the inferior design of these studies to those with larger samples.33,34
A statistician is essential to rule out the number of subjects and analyse the final results of the entire investigation. To perform a suitable well-defined study that produces rational and trustworthy implications that can be applied to the sample population, it is crucial for the investigator to understand the fundamentals of analytical methods. Clinicians can use statistics to extract crucial information from empirical data, which improves patient care. Statistical notions must be considered from the initial planning stage to the final reporting phase. In general, there are two sorts of sample size estimation problems: sample size for (a) an estimating study and (b) to tests a hypothesis, or a comparison study.10
When performing an estimation study, the researcher was interested in estimating the quantity of one or more parameters, including, among other parameters, the mean haemoglobin level or arthritis prevalence. Researchers were interested in comparing population characteristics at one or more time points or characteristics of two or more populations in studies that test hypotheses. For instance, they might compare the prevalence of arthritis between two populations before and after the administration of an intervention. A researcher should select a large number of people if they wanted the estimation in their study to be more precise, because as the accuracy (or margin of error) grows (or lowers), the minimum sample size necessary increases. For instance, for a sample size greater than that preferred for a 95% confidence level, that estimate of a parameter is required. The computation of sample size in studies testing hypotheses aims to obtain the appealed power for disclosing a difference that is therapeutically or experimental significant at a predetermined significance level.35
According to the statistics, there are various methods, test and formula for estimation of the sample size required to perform the research and other relevant studies. But the lack of research regarding the appropriate and whole number needed for performing any research is not established yet, like pilot study confirms the 12 participants for each group enrolling for the particular trial.36
This review highlights the diverse approaches and procedures used in many studies to determine the appropriate sample sizes for educational research. It also emphasized the difficulties in defining the main outcome, choosing the right statistical tests, and taking effect size and statistical power into consideration. This article fosters a critical comprehension of the results and their relevance in various research contexts. Along with the recommendations, key considerations include defining research objectives, selecting appropriate study designs, and ensuring adequate statistical power. Through adherence to PRISMA-S guidelines for rapid reviews, the article emphasizes the importance of transparency and rigor in the review process. This commitment to methodological rigor enhances the credibility and trustworthiness of the insights presented.
The review suggested that the sample size should be considered as soon as possible throughout the research phase to gather more insightful background that will fundamentally have a stronger influence on pedagogic application. All types of research investigations require the determination of sample size, and selecting the appropriate formula is essential. According to the study's main goal, outcome variable, study plan, intended statistical investigation, study groups, and assorting procedure to be utilized, a suitable sample size formula was chosen. The sample population needed for a study is determined by a variety of variables, including the feasibility of the study, its power, the accuracy of the calculated value, its analytical relevance and confidence level, its ability to detect a clinically significant difference, and other factors, such as financial support, workforce, subject availability, and time. Studies involving cluster randomization require a larger sample size and a complex method for calculations. The sample size for conducting any new method basically required 24.24 members in each group. The median sample size for the simulation-based educational research was 30. Further, more research is needed for the appropriate sample size and universal single formula based on every study design.
All data underlying the results are available as part of the article and no additional source data are required.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Epidemiology and Spatial Epidemiology; Bayesian modelling; Biostatistics
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Statistics, Sample Size Calculation, Optimal Design of Experiments
Are the rationale for, and objectives of, the Systematic Review clearly stated?
No
Are sufficient details of the methods and analysis provided to allow replication by others?
No
Is the statistical analysis and its interpretation appropriate?
Partly
Are the conclusions drawn adequately supported by the results presented in the review?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: medical statistics, psychometrics, sample size
Are the rationale for, and objectives of, the Systematic Review clearly stated?
Yes
Are sufficient details of the methods and analysis provided to allow replication by others?
No
Is the statistical analysis and its interpretation appropriate?
No
Are the conclusions drawn adequately supported by the results presented in the review?
No
References
1. Bujang MA: A Step-by-Step Process on Sample Size Determination for Medical Research.Malays J Med Sci. 2021; 28 (2): 15-27 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Statistics, Research methodology, questionnaire validation & development
Are the rationale for, and objectives of, the Systematic Review clearly stated?
Yes
Are sufficient details of the methods and analysis provided to allow replication by others?
No
Is the statistical analysis and its interpretation appropriate?
Yes
Are the conclusions drawn adequately supported by the results presented in the review?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Epidemiology and Spatial Epidemiology; Bayesian modelling; Biostatistics
Are the rationale for, and objectives of, the Systematic Review clearly stated?
Partly
Are sufficient details of the methods and analysis provided to allow replication by others?
No
Is the statistical analysis and its interpretation appropriate?
Not applicable
Are the conclusions drawn adequately supported by the results presented in the review?
No
References
1. Cohen J: Statistical Power Analysis for the Behavioral Sciences. Academic press. 1988.Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Statistics
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
Version 3 (revision) 07 May 24 |
read | read | read | |||
Version 2 (revision) 23 Feb 24 |
read | read | read | read | ||
Version 1 09 Oct 23 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)