Keywords
Hofstee, Angoff, Assessment, Standard setting, Android, MARS
This article is included in the Software and Hardware Engineering gateway.
Hofstee, Angoff, Assessment, Standard setting, Android, MARS
- The focus of the paper, and that it does not discuss the weaknesses of the Hofstee method, has been emphasised.
- Use of Angoff: as this is a debated issue, not central to the paper, the sentence has been removed.
- The caption of Figure 1 and the paragraph preceding Figure 3 have been corrected and amended to clarify that the final parameters used are the means or medians of all the judges’ parameters, and not an individual judge’s parameters.
- The sentence “An ‘Options’ screen allows…user’s needs” has been removed, as it is mostly a repetition of the preceding sentence.
- The comparison of accuracy and speed has been altered to reflect a comparison to manual methods only. In addition, the 2-decimal point precision of the app was verified with the AMBrSoft, (http://www.ambrsoft.com/MathCalc/Line/TwoLinesIntersection/TwoLinesIntersection.htm) website.
See the authors' detailed response to the review by Benedict Canny
See the authors' detailed response to the review by Adam E. Wyse
In medical education assessment, determining the student pass/fail mark is a contentious issue.1 A range of methods can be used to determine this point and are covered in several other papers.2–4 In summary, however, most methods fall into three categories: norm-referenced (determined by the performance of the student group), criterion-referenced (pre-determined as an absolute cut-off point) and compromise methods (a compromise between the previous two methods is found).4
The Hofstee method4–6 is a compromise method that follows four steps, and uses four variables or parameters (explained in more detail below) to determine the cut-off point. While there are weaknesses with the method, and they have been discussed elsewhere,6 this paper is focused on describing the method, and then describing an app that applies the method.
Step 1: Evaluation by judges
In Step 1, judges who are qualified to assess the test make an independent judgement about the values of the following four parameters:
• cmin: The minimum cut-off score (i.e. the score that the judge feels would be the lowest possible score that would be considered as a pass/fail score).
• cmax: The maximum cut-off score (i.e. the score that the judge feels would be the highest possible score that would be considered as a pass/fail score).
• fmin: The lowest percentage of students that the judge feels should fail this test.
• fmax: The highest percentage of students that the judge feels should fail this test.
The four parameters are often indicated with different abbreviations; in this paper, we use cmin, cmax, fmin and fmax as is used elsewhere.4
Step 2: Determining the arithmetic means
Based upon the independent judgements, the arithmetic mean of each parameter is calculated. (Some researchers, e.g. Norcini2, have suggested that medians may also be used).
Step 3: Plot on a graph
After the test has been administered to the students, a graph (Figure 1) is then drawn, plotting the cumulative percentage of students against the scores obtained, and the means of the four parameters.
Hofstee chart showing cumulative scores, where cmin (minimum cut-off score) = 35, cmax (maximum cut-off score) = 45, fmin (the lowest percentage of students that the judges feel should fail) = 6, and fmax (the highest percentage of students that the judges feel should fail) = 18.
Step 4: Determining cut-off
The pass/fail cut-off point is then determined by drawing line AB and finding the intersect with the cumulative line. In the Figure 1 example, the cut-off is determined to be slightly less than 38%. A further 10 hand-drawn attempts by the lead author (KM) consistently placed the results between 37 and 38, with an overall estimation of 37.5.
Apart from the fact that any cut-off method can be debated, there are practical problems associated with this method, and these include:
1. The time taken to accurately draw the chart, and all the associated lines.
2. Reading the cut-off point from an imperfect drawing, rather than determining it mathematically.
3. One might wish to allow for some flexibility, and test other values for the parameters. On a hand-drawn chart, this is time-consuming and untidy, to the point of being impossible.
Hofstee produced a mathematical solution,7 but it requires sorting and frequency pre-calculation and data inspection, and the mathematics involved is not rudimentary (requiring several steps). Van Der Vleuten developed a useful one for SPSS,8 but it uses expensive licensed software.
An Excel template designed by one of the authors (KM) already exists, and plotting the chart on Excel is certainly an improvement over the hand-drawn chart. However, it still requires the data to be pre-sorted and also requires the generation of the cumulative data. In addition, although the chart is drawn more accurately than by hand, it still requires a manual reading of the intersection point.
A search in both the Apple and Android app stores (conducted in January 2020 and again in March 2020) confirmed that there was no such app in either of the stores. To meet this need for a simple and accurate method of determining the Hofstee cut-off, we designed and developed a simple Android app. The app automatically sorts the data, draws the chart, and calculates the cut-off point algebraically. The result is a process that is faster and more accurate than the other methods that require manual drawing and/or reading of the graph.
For usability and evaluation, the app was designed according to the relevant principles laid out in the Mobile App Rating Scale (MARS).9 The overall MARS scale is broad, and so does have a few weaknesses when applied to this type of app (e.g. it rates the entertainment value of the app), but it is still a useful guide. In addition, the app is available free of charge, and with no advertisements.
The app, HofsteeCalc, was developed using MIT App Inventor Version 2 (builds nb182 through to nb186a). MIT App Inventor uses its own visual, block-based programming interface to develop Android and iOS apps. In addition to the internal code, the app uses three external sets of libraries and routines for browsing to and selecting the data file,10 sorting the data,11 and charting the data.12 No user or device information is collected. The app is optimised for Android 2.1 and higher, API level 28, and requires permission to read from and store data to the device.
See Figure 2 for workflow chart.
The app automatically creates a data folder and has a test file that the user can use for testing before they insert their data.
The app allows each judge’s individual parameters to be entered (up to a maximum of 10 judges), and then calculates the means, standard deviations, and medians (Figure 3a). The parameters are automatically stored if required and are available the next times the app runs. When the user returns to the main screen (Figure 3b) the means or medians of all the judges’ parameters are automatically inserted into the text boxes. Alternately, if the final means or medians of the judges’ parameters have been calculated elsewhere, these means or medians can be entered directly into the main screen text boxes (Figure 3b).
HofsteeCalc app: multiple judges’ parameters and output from Figure 1 data, where cmin is minimum cut-off score, cmax is maximum cut-off score, fmin is the lowest percentage of students that the judge feels should fail, and fmax is the highest percentage of students that the judge feels should fail.
For data input, the student data need to be in a single-column standard.csv file. If the.csv file contains more than one column of data, only the first column will be read. The app automatically sorts the data, so these do not have to be pre-sorted by the user.
When the charts are to be drawn, the user can view either the chart of the whole data set (see Figure 3b: Draw Full Chart), or a detailed section (covering data which is within and close to the range of the parameters (see Figure 3b: Draw Detailed Chart)). With pinching, users can zoom in and out of the charts.
As the focus of the app is a functional tool, it has a simple user interface, and includes a ‘Help’ screen that explains in detail how it is to be used. Although the app assumes a knowledge of the Hofstee method, it supplies additional references for the user. Allowing for personal preferences, it permits the user to change some user-interface colours to suit individual needs.
In the Hofstee chart, we know the x1y1 and x2y2 coordinates of line AB (Figure 1). However, because the cumulative score line does not have an algebraic formula, calculating the intersection between this straight line and the cumulative line is not possible (using ‘best fit’ or ‘nearest neighbour’ might be possible but will not give 100% accuracy). It is for this reason that current users of the Hofstee method read the point manually from hand-drawn charts.
The data, however, are x1y1 and x2y2 coordinates of straight lines, and these coordinates are stored in an array (or list). So, the algebraic algorithm for determining the cut-off can be expressed in the following pseudo-code:
For each straight line in the array of lines forming the cumulative line
Read the x1y1 and x2y2 coordinates of that line
Algebraically determine the intersection point (xi) of this straight line and line AB
IF x1 ≤ xi ≤ x2 [there is no need to test the y coordinate]
THEN xi is the cut-off point
(If the cut-off (xi) is a data point, then two lines would meet this condition, but that is no matter, as the point is identical.)
Readers may recognise that, because the cut-off point is determined algebraically, there is no need to draw the chart for the calculation. The chart, however, has been included in the app because most users are used to it, and also because they may wish to make manual adjustments to the parameters based on the visual reading of the data.
After various early test versions, Version 1.0 of the app was completed in February 2021, and uploaded into the Google Play Store at: https://play.google.com/store/apps/details?id=appinventor.ai_itmeded.HofsteeCalc. Since then, small updates have been performed, and the app is currently on Ver. 1.1.
Conforming to the requirements laid out in the Introduction above, the app is available free of charge, with no advertisements. It does not require access to the internet, and it does not collect, store, or transmit any personal information about the user or the device.
The app was alpha tested on various real and hypothetical, sorted and unsorted datasets (see Underlying data13), with up to 1,000 items, and consistently returned accurate results. For example, for the dataset used in Figure 1, the app calculated the cut-off at 37.62%, rather than “slightly less than 38%” (See Figure 3b). In addition, from the raw data, the coordinates of the two lines were manually determined (38,18; 45,6) and (40,17; 30,8), and the intersect between these two lines was arithmetically determined through the AmBrSoft site, and the result was found to be 37.62, which is the identical result from the app. This was confirmed with an enlarged manual graphing which also placed the result at slightly more than 37.6 (in real life, although this method would get similar accuracy to the app, it would extend the time by a further 10 minutes or so).
The time to draw the chart and determine the cut-off from a dataset of unsorted, 1,000 randomly-generated numbers (MS-Excel 2019 RANDBETWEEN(1,100)), was approximately 2 seconds (Samsung S8, Model SM-G955FD, Android Ver. 9, Build PPR1.180610.011.G955FXXS6DTA1).
Using the Mobile App Rating Scale (MARS),9 both authors independently measured the app against the scale, and arrived at a score of 4.07 and 3.88, respectively. As detailed above, this less-than-ideal score was expected, as the MARS includes items not entirely appropriate to such an app.
For use cases, anonymised data sets are available in Underlying data.13
An example of a use case utilised the data in the sheet HofsteeCalcRealDataClass01.csv.
The data set has 181 items, and the item values range from 43 to 97. The data set is unsorted.
The input parameters were determined as shown in Table 1.
Use case input parameters for HofsteeCalcRealDataClass01.csv, where cmin is minimum cut-off score, cmax is maximum cut-off score, fmin is the lowest percentage of students that the judge feels should fail, and fmax is the highest percentage of students that the judge feels should fail.
Rater | cmin | cmax | fmin | fmax |
---|---|---|---|---|
1 | 50 | 60 | 1 | 4 |
2 | 44 | 56 | 3 | 8 |
3 | 41 | 52 | 4 | 8 |
4 | 45 | 54 | 5 | 8 |
5 | 45 | 53 | 2 | 7 |
Based on this use case, Figure 4a shows the input parameters. Figure 4b shows the resultant ‘Detailed chart’, the Hofstee cut-off percentage (53.80), and the largest data gaps in the vicinity of the Hofstee cut-off percentage. In addition, from the raw data, the coordinates of the two lines were manually determined (45,7; 55,3) and (53.5,3.31; 54.5,3.87), and the intersect between these two line was arithmetically determined through the AmBrSoft site, and the result was found to be 53.80, which is the identical result from the app.
HofsteeCalc app: multiple judges’ parameters and output from HofsteeCalcRealDataClass01.csv, where cmin is minimum cut-off score, cmax is maximum cut-off score, fmin is the lowest percentage of students that the judge feels should fail, and fmax is the highest percentage of students that the judge feels should fail.
This paper has described the successful design and development of a free, advertisement-free, Android app to calculate the Hofstee cut-off. The app meets basic design principles as established in the MARS scale, and alpha- and beta- testing has shown the app to be accurate and fast. The app is available in the Google Play app store (see Software availability14).
Full usability and ease of use will be tested in the future through more rigorous, wide-spread testing among medical educators.
When educating future health professionals, determining fair pass/fail cut-off points is crucial. The time taken to perform such procedures, however, adds to medical educators’ already over-burdened schedules, and competes with a range of other demands in this schedule, so it is inevitable that short-cuts and errors will occur. This research has traced the design and development of a tool that can both save time and improve accuracy when determining the Hofstee cut-off.
Zenodo: HofsteeCalcDataSets. https://doi.org/10.5281/zenodo.4699233.13
This project contains the following underlying data:
• RawDataForTesting.csv (data set that is built into the app’s assets).
• TestingForAppData.csv (data set used to generate Figure 1 and Figure 3b).
• HofsteeCalcRealDataClass01.csv (data set available for testing).
• HofsteeCalcRealDataClass02.csv (data set available for testing).
• HofsteeCalcRealDataClass03.csv (data set available for testing).
• HofsteeCalcRealDataClass04.csv (data set available for testing).
Data are available under the terms of the Creative Commons Attribution 4.0 International licenses (CC-BY 4.0).
Software available from Google Play app store: https://play.google.com/store/apps/details?id=appinventor.ai_itmeded.HofsteeCalc
Archived source code at time of publication: https://doi.org/10.5281/zenodo.4633140.14
Licence: Creative Commons Attribution 4.0 International license (CC-BY 4.0).
The authors would like to acknowledge Prof. Cees van der Vleuten, Maastricht University, for sending us a copy of Hofstee 1997.7
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: I have published several papers on the Hofstee method. One of these papers is referenced by the authors. It is important to me that my work and some of the ideas presented in those papers are accurately reflected in the article.
Reviewer Expertise: Psychometrics; Standard Setting; Measurement; Item Response Theory; Assessment
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Education Research
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Education Research
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
Partly
References
1. Wyse A: Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods. Applied Measurement in Education. 2020; 33 (2): 159-173 Publisher Full TextCompeting Interests: I have published several papers on the Hofstee method. One of these papers is referenced by the authors. It is important to me that my work and some of the ideas presented in those papers are accurately reflected in the article.
Reviewer Expertise: Psychometrics; Standard Setting; Measurement; Item Response Theory; Assessment
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 05 Oct 21 |
read | read |
Version 1 07 Jun 21 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)