ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Note
Revised

Multiple statistical tests: Lessons from a d20

[version 2; peer review: 3 approved]
PUBLISHED 07 Sep 2016
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Statistical analyses are often conducted with α= .05. When multiple statistical tests are conducted, this procedure needs to be adjusted to compensate for the otherwise inflated Type I error. In some instances in tabletop gaming, sometimes it is desired to roll a 20-sided die (or 'd20') twice and take the greater outcome. Here I draw from probability theory and the case of a d20, where the probability of obtaining any specific outcome is 1/20, to determine the probability of obtaining a specific outcome (Type-I error) at least once across repeated, independent statistical tests.

Keywords

statistical analysis, error, probability, statistical test

Revised Amendments from Version 1

Based on reviewer comments, a few minor edits have been made: (1) fixed minor phrasing issues (e.g., 'regular' vs. 'conventional'); (2) added the reference discussing methods for correcting for multiple comparisons; and (3) removed the 'across n die' fragment.

To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.

Introduction

In scientific research, it is important to consider the issue of conducting multiple statistical tests and the likelihood of spuriously obtaining a ‘significant’ effect. Within a null-hypothesis significance testing (NHST) framework, statistical tests are usually conducted with α = .05, i.e., the likelihood of falsely rejecting the null hypothesis as .05. Interestingly, this value coincides with the probability of obtaining a specific outcome on a 20-sided dice (or ‘d20’), as 120 = .05. In the current (fifth) edition of Dungeons & Dragons, a tabletop game, many in-game events are determined based on the outcome of a d20. However, to make some events more likely, there are times when players roll a d20 ‘with advantage’, meaning that they roll the d20 twice and take the greater value1. (There are also instances where a d20 is rolled ‘with disadvantage’, where the lesser value is taken, but here I will only focus on the former case.) This parallels the use of NHST without any correction for multiple comparisons, as it is more likely to get a significant effect due to chance (i.e., Type-I error) if many tests are conducted without a correction for multiple comparisons.

Here I wondered how much the probability of obtaining a 20, on a d20, would increase due to multiple tests–i.e., obtaining at least one 20 across n die. This approach assumes that each statistical test is wholly independent from each other, and thus is likely to over-estimate the effect related to conducting multiple statistical tests using variations in how the measures are calculated or the use of different, but correlated, measures. Nonetheless, this exploration is based in probability theory and mathematical derivations, rather than computational simulations, and can serve as an comprehensible primer in understanding the relationship between repeated statistical tests and probability distributions.

Developing an intuition of statistics and probability distributions is of particular importance as most people, both laymen2 and scientists3,4, have misconceptions about NHST. This is further compounded by critics of NHST, which often over-emphasize the limitations of the approach, e.g., see 46. By providing a comprehensible example of how repeated statistical tests can inflate chance likelihoods, I hope that these demonstrations can improve researchers’ intuitions regarding NHST. This approach is not contrary to those suggested by the use of confidence intervals and Bayesian statistics—which have become increasingly adopted across the life sciences, from medicine to psychology7,8—but rather to improve comprehension of the characteristics of NHST.

Mathematical derivations

The probability that of a specific outcome occurring on each on n die, each with d sides is:

P(d,n)=1(d1d)n

The probability of obtaining a specific outcome across n rolls of a d-sided die are listed in Table 1.

Table 1. Probability of obtaining a specific outcome at least once, using a d-sided die rolled n times.

nd = 2d = 6d = 20d = 100d = 1000
1.5000.1667.0500.0100.0010
2.7500.3056.0975.0199.0020
3.8750.4213.1426.0297.0030
4.9375.5177.1855.0394.0040
5.9688.5981.2262.0490.0050
6.9844.6651.2649.0585.0060
7.9922.7209.3017.0679.0070
8.9961.7674.3366.0773.0080
9.9980.8062.3698.0865.0090
10.9990.8385.4013.0956.0100
201.0000.9739.6415.1821.0198
501.0000.9999.9231.3950.0488
1001.00001.0000.9941.6340.0952
5001.00001.00001.0000.9934.3936
1,0001.00001.00001.00001.0000.6323
10,0001.00001.00001.00001.00001.0000

To develop some intuition of the effect of multiple die rolls, several simple cases can be considered.

For d = 2, i.e., a coin, the probability of obtaining at a heads when flipping one coin (n = 1) is 12. The probability of obtaining a heads twice (with two coins, n = 2) is (12)2 or 14. In contrast, the probability of obtaining at least one heads when flipping two coins is 34, as there are four possible outcomes ({HH, HT, TH, TT}) and three of them satisfy the criteria of ‘at least one heads’ ({HH, HT, TH}) and only one outcome does not ({TT}). This can more clearly be considered as the complementary event, where the probability is 1 − 14, which resolves to 34.

For d = 6, i.e., a ‘conventional’ six-sided die, the probability of obtaining any specific outcome is 16. When considering multiple dice, it is again important to differentiate the probability of obtaining ‘obtaining the same specific outcome multiple times’, e.g., the probability of obtaining two sixes with two dice is (16)2 = 136, from the case of ‘obtaining at least one specific outcome across multiple dice’. To determine the probability of obtaining a specific out-come on any of multiple dice, the complementary event should again be considered, i.e., the probability of not obtaining that outcome on any of the die. For n = 1, the probability of not obtaining a specific outcome is 56. Following from this, the probability of obtaining that specific outcome is 1 − 56 or 16. When n = 2, the probability of not obtaining a six on either of the dice is (56)2, which resolves to 2536. The complementary event of obtaining ‘at least one six’ is 1 − 2536 or 1136. Here we can see that with two dice, the probability of obtaining at least one six (or any other specific outcome) is nearly doubled, from 636 (i.e., 16 with a single die).

For d = 20, i.e., a 20-sided die, the probability of obtaining any specific outcome is 120 or .05. If n = 2 dice are rolled, the probability of obtaining at least one 20 is 39400 or .0975. If n = 10 dice are rolled, the probability of obtaining at least one 20 is ≈ .4013. With n = 20 dice, this increases further to ≈ .6415.

We can also consider a more general problem, the probability of obtaining an outcome of o or greater, on at least one of n d-sided die:

P(d,n,o)=1(d(do+1)d)n

For instance, when rolling a six-sided die, the probability of obtaining a five or higher is 26 (equivalent to 1236). Following from the same approach of calculating the complementary event, the probability of obtaining not obtaining any two specific outcomes across multiple dice is 1 − (46)2, which resolves to 2036. Figure 1 and Table 2 show the probability of obtaining at least o on a d = 20 die, across n dice.

9cb13aa6-d2ce-48f1-8154-c90212bfec2b_figure1.gif

Figure 1. Probability (P) of obtaining at least an outcome o once, across n d20 die.

Table 2. Probability of obtaining at least an outcome o once, across n d20 die.

on = 1n = 2n = 3n = 4n = 5
11.00001.00001.00001.00001.0000
2.9500.9975.99991.00001.0000
3.9000.9900.9990.99991.0000
4.8500.9775.9966.9995.9999
5.8000.9600.9920.9984.9997
6.7500.9375.9844.9961.9990
7.7000.9100.9730.9919.9976
8.6500.8775.9571.9850.9947
9.6000.8400.9360.9744.9898
10.5500.7975.9089.9590.9815
11.5000.7500.8750.9375.9688
12.4500.6975.8336.9085.9497
13.4000.6400.7840.8704.9222
14.3500.5775.7254.8215.8840
15.3000.5100.6570.7599.8319
16.2500.4375.5781.6836.7627
17.2000.3600.4880.5904.6723
18.1500.2775.3859.4780.5563
19.1000.1900.2710.3439.4095
20.0500.0975.1426.1855.2262

Discussion

While it is widely understood that multiple comparisons need to be corrected for, many would underestimate the degree of inflation in Type-I error associated with additional, uncorrected statistical tests. Critically, statistical procedures have been developed to correct for multiple comparisons (e.g., Bonferroni, Tukey’s HSD), see 9 for a detailed review. Nonetheless, the mathematical derivations presented here clearly illustrate the influence of multiple statistical tests on the likelihood of obtaining a specific outcome due to chance alone. These derivations and examples should be useful in providing a concrete example of the problem associated with uncorrected multiple comparisons and may prove useful as a pedagogical tool.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 02 Jun 2016
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Madan CR. Multiple statistical tests: Lessons from a d20 [version 2; peer review: 3 approved]. F1000Research 2016, 5:1129 (https://doi.org/10.12688/f1000research.8834.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 07 Sep 2016
Revised
Views
22
Cite
Reviewer Report 07 Sep 2016
Jens Foell, Department of Psychology, Florida State University, Tallahassee, FL, USA 
Approved
VIEWS 22
I confirm that I have read this submission and believe that I have an ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Foell J. Reviewer Report For: Multiple statistical tests: Lessons from a d20 [version 2; peer review: 3 approved]. F1000Research 2016, 5:1129 (https://doi.org/10.5256/f1000research.10322.r16168)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 02 Jun 2016
Views
48
Cite
Reviewer Report 05 Jul 2016
Steven R. Shaw, Department of Educational and Counselling Psychology (ECP), McGill University, Montreal, QC, Canada 
Approved
VIEWS 48
The author provides an excellent foundation for developing an intuitive understanding of null hypothesis significant testing. The concept of using a 20 sided die to assist graduate students and new researchers to better understand what exactly is meant by .05 ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
R. Shaw S. Reviewer Report For: Multiple statistical tests: Lessons from a d20 [version 2; peer review: 3 approved]. F1000Research 2016, 5:1129 (https://doi.org/10.5256/f1000research.9509.r14147)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
50
Cite
Reviewer Report 08 Jun 2016
Matthew Wall, Division of Brain Sciences, Imperial College London, London, UK 
Approved
VIEWS 50
This short report provides a simple and concise illustration of some of the issues surrounding multiple comparisons in statistical testing. I see nothing wrong with the logic or the mathematics, and can see that this would make a valuable contribution ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wall M. Reviewer Report For: Multiple statistical tests: Lessons from a d20 [version 2; peer review: 3 approved]. F1000Research 2016, 5:1129 (https://doi.org/10.5256/f1000research.9509.r14150)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
60
Cite
Reviewer Report 03 Jun 2016
Jens Foell, Department of Psychology, Florida State University, Tallahassee, FL, USA 
Approved
VIEWS 60
The author provides a novel demonstration of the increasing probability of spurious research results when performing multiple tests: noticing the equivalence of a p = .05 statistical threshold and a d20 die (as used in popular games), the author goes ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Foell J. Reviewer Report For: Multiple statistical tests: Lessons from a d20 [version 2; peer review: 3 approved]. F1000Research 2016, 5:1129 (https://doi.org/10.5256/f1000research.9509.r14149)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 02 Jun 2016
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.