Missing the point: are journals using the ideal number of decimal places?

Adrian G Barnett

doi:10.12688/f1000research.14488.1

Home Browse Missing the point: are journals using the ideal number of decimal...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Missing the point: are journals using the ideal number of decimal places?

[version 1; peer review: 1 approved, 1 approved with reservations]

Adrian G Barnett

PUBLISHED 11 Apr 2018

Author details Author details

Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, Queensland, 4059, Australia

Adrian G Barnett
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Research on Research, Policy & Culture gateway.

Abstract

Background: The scientific literature is growing in volume and reducing in readability. Poorly presented numbers decrease readability by either fatiguing the reader with too many decimal places, or confusing the reader by not using enough decimal places, and so making it difficult to comprehend differences between numbers. There are guidelines for the ideal number of decimal places, and in this paper I examine how often percents meet these guidelines.
Methods: Percents were extracted from the abstracts of research articles published in 2017 in 23 selected journals. Percents were excluded if they referred to a statistical interval, typically a 95% confidence interval. Counts and percents were calculated for the number of percents using too few or too many decimal places, and these percents were compared between journals.
Results: The sample had over 43,000 percents from around 9,500 abstracts. Only 55% of the percents were presented according to the guidelines. The most common issue was using too many decimal places (33%), rather than too few (12%). There was a wide variation in presentation between journals, with the range of ideal presentation from a low of 53% (JAMA) to a high of 80% (Lancet Planetary Health).
Conclusions: Many percents did not adhere to the guidelines on using decimal places. Using the recommended number of decimal places would make papers easier to read and reduce the burden on readers, and potentially improve comprehension. It should be possible to provide automated feedback to authors on which numbers could be better presented.

Keywords

decimal places, meta-research, readability, statistics

Corresponding author: Adrian G Barnett

Competing interests: No competing interests were disclosed.

Grant information: AB receives funding from the Australian National Health and Medical Research Council (APP1117784).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2018 Barnett AG. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Barnett AG. Missing the point: are journals using the ideal number of decimal places? [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2018, 7:450 (https://doi.org/10.12688/f1000research.14488.1) First published: 11 Apr 2018, 7:450 (https://doi.org/10.12688/f1000research.14488.1) Latest published: 10 Aug 2018, 7:450 (https://doi.org/10.12688/f1000research.14488.3)

Introduction

“Everything should be made as simple as possible, but not simpler.” Albert Einstein (paraphrased).

Scientists read papers in order to keep up with the latest developments in their field and improve their research. However, the ever-increasing number of papers is placing greater demands on scientists’ time. In 2010 there were an estimated 75 trials and 11 systematic reviews published per day in the field of health and medicine¹, and by 2012 the number of systematic reviews had more than doubled to 26 per day². Papers have also become less readable over time, with an increase in the use of scientific jargon³.

Poorly presented numbers can decrease readability and can distort or even hide important information. Statistical software packages show results to many decimal places, but this level of accuracy may be spurious, and authors may overcrowd a paper with numbers if they copy the results from software without considering what level of accuracy is appropriate. Papers have been criticised for using too many decimal places, for example, a recent study of just 27 patients that displayed odds ratios to two decimal places⁴. Journal impact factors have also been frequently criticised for spurious accuracy, as they are quoted to three decimal places⁵.

Authors may also over-simplify numbers by rounding and losing important information. For example, a review of the gender bias in funding peer review reported in a results table that 20% of applicants were female in a study of 41,727 applications⁶, so from these results we only know that the number of female applicants was somewhere between 8,137 and 8,554, a range of 417. To use these results in a meta-analysis it would be better to know the actual number of applicants. The large sample size in this example means that potentially useful information is lost by rounding the percent to an integer.

Authors must strike a balance between presenting numbers with too little or too much detail. The abstract and discussion are a summary of the findings, and here numbers can be rounded to make sentences easier to read. Numbers in the results section and tables can be presented with more detail, because they can be an accurate record of the data (e.g., for meta-analysis) and the reader is usually not expected to read every number in a table, especially a large table. Of course, tables can be made clearer by reducing unnecessary numbers, and so allowing the reader to easily comprehend the key information. There is a similar balance to consider when using acronyms in papers, as an overuse of acronyms can make a paper hard to understand because readers need to retrieve additional information, whereas using established acronyms can speed up reading.

There are guidelines by Cole⁷ for presenting numerical data, including means, standard deviations, percentages and p-values. These guidelines are part of the wider EQUATOR guidelines for “Enhancing the QUAlity and Transparency Of health Research” http://www.equator-network.org/⁸. Cole’s guidelines for percentages are:

Integers or one decimal place for values under 10%, e.g., 1.1%
Integers for values above 10%, e.g., 22% not 22.2%
One decimal place may be needed for values between 90% to 100% when 100% is a natural upper bound, for example the sensitivity of a test, e.g., 99.9% not 100%
Use two or more decimal places only if the range of percents being compared is less than 0.1%, e.g., 50.50% versus 50.55%

There are also guidelines from journals and style guides. For example, the instructions to authors for the journal Australian Zoologist state that, “Numbers should have a reasonable and consistent number of decimal places.” The Australian style guide also recommends a consistent number of decimal places when comparing numbers, so “1.23 vs 4.56” not “1.23 vs 4.5”⁹. The Economist style guide recommends, “resisting the precision of more than one decimal place, and generally favouring rounding off. Beware of phoney over-precision.”¹⁰

It is not clear whether Cole’s guidelines on presenting numerical data are being adhered to, or if there is generally too little or too much rounding in published papers. An audit of 1,250 risk ratios and associated confidence intervals from the abstracts of BMJ papers between 2011 to 2013 found that one quarter of confidence intervals and an eighth of estimates could have been presented better¹¹.

This paper examines a large sample of percents in recent abstracts for multiple journals to examine how they are being presented.

Methods

Data extraction

I extracted percentages from abstracts available in PubMed using the “rentrez” R package (version 1.1.0)¹². Example abstracts are referred to using their PubMed ID number rather than citing the paper, and readers can find the paper’s details by putting the number into a PubMed search with the search term “[PMID]”.

I searched for papers in the following journals: The BMJ, BMJ Open, Environmental Health Perspectives, F1000Research, JAMA, The Lancet, The Medical Journal of Australia, Nature, NEJM, PLOS ONE and PLOS Medicine. These journals were selected to give a range of journals that publish articles in health and medicine, including some high profile journals and some large open access journals. To look at recent papers, I restricted the search to 2017. To focus on research papers, I restricted the search to article types of: Journal Article, Clinical Trial, Meta-Analysis, Review, Randomized Controlled Trial and Multicenter Study. The search returned 33,147 papers across 23 journals (searching for “The Lancet” included all journals in the Lancet stable).

Despite the initial restriction on article type, the search results included non-research papers that had multiple types, e.g., a retraction of a clinical trial. Hence I excluded any papers that included an article type of: Biography, Conference, Comment, Corrected, Editorial, Erratum, Guideline, Historical, News, Lectures, Letter or Retraction.

A flow diagram outlining the selection of papers is shown in Figure 1.

Figure 1. Flow diagram of included papers.

I examined only percents because they are a widely used and important statistic, and are relatively easy to extract using text mining compared with other important statistics, such as the mean or rate ratio. I extracted all percentages from the text of the abstract by searching for all numbers suffixed with a “%”. The key steps for extracting the percents from the abstract were:

1. Simplify the text by removing the “±” symbol and other symbols such as non-separating spaces
2. Find all the percents
3. Exclude percents that refer to statistical intervals or statistical significance, e.g, “95% confidence interval”
4. Record the remaining percents as well as the number of decimal places and significant figures

The complete steps are detailed in the R code available here:https://github.com/agbarnett/decimal.places. Based on Cole’s guidelines⁷, I defined the ideal number of decimal places as:

0 for percents between 10 and 90, and percents over 100
1 for percents between 0.1 and 10, and percents between 90 and 100, and percents of exactly 0
2 for percents under 0.1
3 for percents under 0.01
4 for percents under 0.001 but greater than 0

Preferably I would have also considered a greater number of ideal decimal places when the aim was to compare a small difference in two percents. For example, 10.5% compared with 10.6% in the same sentence (PubMed ID 28949973) would both be considered as having one decimal place too many using the above guidelines, but the additional decimal place may be warranted if the small difference of 0.1% is clinically meaningful. However, accurately calculating a small difference of less than 0.1% requires all percents to be displayed using two or more decimal places. Ultimately I ignored this issue because it applied to so few abstracts.

I removed percents that referred to statistical intervals (e.g., “95% CI”) as these were labels not results. I searched for common interval percents of 80%, 90%, 95% and 99%. I combined these four percents with the words: “confidence interval”, “credible interval”, “Bayesian credible interval”, “uncertainty interval”, “prediction interval”, “posterior interval” and “range”. I included versions using capital and non-capital letters, and the standard acronyms including “CI” and “PI”. I also removed references to statistical significance percents using the common percents of 1%, 5% and 10% combined with the words: “significance”, “statistical significance” and “alpha level”.

I verified that the percents were correctly recorded for 50 randomly selected articles which contained 198 percents. There were no errors in the recorded percents, but there were 5 percents that were labels rather than results (e.g., “the prevalence of pretreatment NNRTI resistance was near WHO’s 10% threshold” PubMed ID 29198909), and there was an error with the ideal number of decimal places being 4 for a percent of 0% which led to a change in my ideal number of decimal places. There was also a “95% fixed kernel density estimator” which is a statistical interval and illustrates the difficulty of removing every type of statistical interval. I also checked the percentages for the abstract with the largest number of percents and the abstracts with the largest number of decimal places and significant figures. I also checked some abstracts that included percents of exactly 95% to check for any missing interval definitions. These checks led to additional definitions of intervals including the non-standard arrangements of “95% IC”, "CI 95%" (PubMed ID 28228447) and the typo "uncertainly interval" (PubMed ID 29171811).

I only extracted percents that were suffixed with the percent symbol. For example, the only extracted percent for the text “5–10%” or “5 to 10%” would be 10%. Any percents written in words were also not extracted. I also did not extract numbers immediately before the word “percent” or “per cent” as I assumed that these would be rare. I ignored the sign of the percent as I was primarily interested in presentation, so for example “–10%” was extracted as 10%. Similarly “<10%” was extracted as 10%. I only used the abstracts, rather than the main text, because: 1) abstracts are freely available on PubMed for a wide range of journals, whereas the full text can only be easily mined for open access journals such as PLOS ONE, 2) the abstract is a summary of the results and so percentages should be presented according to Cole’s guidelines, whereas percents may be presented with more decimal places in the results in order to give an accurate and reusable record of the data.

Statistical analysis

I calculated the difference between the observed number of decimal places and the ideal number as defined above. Because most differences were within ±1, I categorised the data into: too few, just right, and too many. I plotted the three categories by journal. I estimated confidence intervals for the percents in these three categories using a Bayesian Multinomial Dirichlet model¹³. The large sample size meant all confidence intervals had a width of 2% or less when using the complete sample, hence I did not present these intervals as they were not useful. The intervals are used to summarise the uncertainty for the results from journals. I did not adjust for the clustering of multiple percents within the same abstract.

In a sensitivity analysis I excluded percents that could be due to digit preferences, which were those with no decimal places that were a multiple of 10, as well as 75%. I also excluded percents between 90% and 100% because these may or may not have had a natural upper bound at 100%, and so it is difficult to automatically judge whether they should be presented with one or no decimal places. The data extraction and analyses were made using R (version 3.4.3)¹⁴. All the data and code are available here: https://github.com/agbarnett/decimal.places.

Results

There were 43,119 percents from 9,482 abstracts. Over half the percents were from PLOS ONE (Supplementary Table 1). The median number of percents per abstract was 3 with an inter-quartile range from 2 to 6. A histogram of all percents between 0 and 100 is shown in Figure 2, this excludes the 195 percents (0.05%) that were greater than 100%. There are spikes in the histogram at multiples of 10% and at 1%, 5%, 75% and 95%, these are likely due to digit preferences where percents have been rounded to commonly used values.

Figure 2. Histogram of all percents between 0% and 100% in 1% bins.

The percent and number of percents meeting the guidelines are in Table 1. The recommended number of decimal places were used just over half the time. When the number of decimal places differed from the guidelines, it was more likely to be too many decimals (33%) rather than too few (12%). Only 21% of abstracts (1,947 out of 9,482) used the ideal number of decimal places for every percent. After excluding the digit preference percents, as many of these were not results, the recommended number of decimal places were used just 50% of the time and the percent of time that too many decimal places were used increased to 40%.

Table 1. Percent and number of percents in abstracts where the guidelines on decimal places were used or not.

Decimal places	All percents % (number)	Excluding digit preferences % (number)
Too few	12 (4,981)	10 (3,574)
Just right	55 (23,872)	50 (17,408)
Too many	33 (14,266)	40 (14,047)
Total	100 (43,119)	100 (35,029)

An example where too many decimal places were used is, “True retentions of α-tocopherol in cooked foods were as follows: boiling (77.74-242.73%), baking (85.99-212.39%), stir-frying (83.12-957.08%), deep-frying (162.48-4214.53%)” (PubMed ID 28459863).

An example where too few decimal places were used is, “263 [3%] of 8313 vs 158 [2%] of 8261” (PubMed ID 29132879). As the numerators and denominators are given, we can recalculate the two percents using the recommended one decimal place, which are 3.2% and 1.9%, respectively, a difference of 1.3%. Without the reader working out these percents, the implied difference could be smaller than 0.1% because 3% could be as little as 2.5% (rounded up to 3%) and 2% could be as large as 2.4% (rounded down to 2%).

There were abstracts where the number of decimal places varied within the same sentence, for example, “pre-2010 vs post-2010 31.69% vs 64%” (PubMed ID 29138196).

Some percents which I judged as having too few decimal places, were potentially harshly judged because the sentence aimed to give general percents, for example the following sentence probably did not need the percents to one decimal place, “it is a common, chronic condition, affecting 2–3% of the population in Europe and the USA and requiring 1–3% of health-care expenditure” (PubMed ID 28460828). Some percents with too few decimal places according to Cole’s guidelines were presented using a consistent number of decimal places, for example, “we noted reductions in genotypes 6 and 11 (from 12% [95% CI 6-21%], to 3% [1-7%]” (PubMed ID 28460828); using the guidelines all the percents under 10% should have had one decimal place. Some percents with too few decimal places were correctly presented with no decimal places because the sample size was under 100, for example, “with a specimen obtained at 13 to 15 months, in 1 of 25 (4%)” (PubMed ID 26465681). I could not adjust for this correct presentation because I did not extract sample sizes.

Results by journal

There were large differences between some journals in the number of decimal places used (Figure 3). There is some grouping of Lancet journals, which collectively leaned towards using too few decimal places. The two journals with means closest to the ideal were Lancet Planetary Health and Nature, although the sample size for Lancet Planetary Health is just 30 (Supplementary Table 1). There was a negative correlation between using too few and too many decimal places, so journals that used too many decimals places were less likely to use too few, and vice versa.

Figure 3. Scatter plot of the percent of times journals used too few and too many decimal places with 95% confidence intervals (horizontal and vertical lines).

Only two journals had specific guidelines about decimal places in their online instructions to authors (Supplementary Table 2) and these both concerned not using decimal places where the sample size was under 100 (sensible advice which I did not consider here). Some instructions to authors did encourage the use of the EQUATOR guidelines, from where Cole’s guidelines for decimal places are available.

Discussion

Numerical results are vitally important for quantitative research papers. Presenting numbers with too many decimal places can unnecessarily tax the reader and obscure important differences, whereas too much rounding makes it hard to compare results and can make differences appear too small. Overall, I found that only around half of all percents were presented according to the guidelines for decimal places, and the most common problem was using too many decimals. The overuse of decimals may stem from a belief that more numbers reflect greater accuracy. It is also likely that most researchers are not aware of the guidelines for presenting percents and other statistics.

The guidelines are not written in stone and good arguments can be made for not using them in some circumstances, for example, using no decimal places where all the percents are just above and below 10%, or where the differences are large enough to clearly show importance (e.g., a 1% versus 9% difference in mortality instead of 1.0% versus 9.0%). Hence the “around half” estimate for imperfect presentations found here likely overstates the problem. Additionally, there are far more serious mistakes that can be made with numbers, such as using the wrong data¹⁵ or mislabelling statistics.

I found large differences between journals in the number of decimal places used. These differences could be due to editorial policy and also differences in the training and experience of the journals’ author cohorts. Nature had one of the best results in terms of ideal presentation, and they published relatively few papers, which may mean they have more time to edit papers for clarity and presentation. PLOS ONE had the most amount of papers in the sample and did relatively badly compared with the guidelines, perhaps because there is no time for editors to fix issues with presenting numbers given the large volume of papers and other important tasks, for example, checking for plagiarism and undeclared competing interests.

The difference in standards between journals likely adds to the confusion for authors about how to present numbers. Greater consistency and better presentation might be improved by having an automated checking procedure similar to the statcheck program that checks for errors in statistical reporting¹⁶. This could be used to flag numbers that may need changing and could be part of an automated submission process for journals through online writing tools such as Overleaf¹⁷. Automating the process would reduce the burden on journal staff.

I only examined percents, but it is likely that other statistics, such as means and risk ratios, are also being imperfectly presented. In fact, using percents may underestimate the problem of spurious accuracy because percents are almost always between –100 and 100, whereas means can take on a far wider range depending on the unit of measurement and a wider range of numbers creates more opportunity for poor display. I only examined percents because these are the easiest numbers to automatically extract from text, thanks to the “%” suffix.

Conclusions

Many percents in abstracts did not adhere to the guidelines on using decimal places. A more considered use of decimal places would increase readability and potentially improve comprehension.

Data availability

All the data and code are available here: https:// github.com/agbarnett/decimal.places.

Archived data and code as at time of publication: http://doi.org/10.5281/zenodo.1213574¹⁸.

Competing interests

No competing interests were disclosed.

Grant information

AB receives funding from the Australian National Health and Medical Research Council (APP1117784).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Supplementary material

Supplementary Table 1. Table of the percent of times journals used too few and too many decimal places according to the guidelines.

Click here to access the data.

Supplementary Table 2. Instructions to authors about decimal places for percents from the selected journals.

Click here to access the data.

Faculty Opinions recommended

References

1. Bastian H, Glasziou P, Chalmers I: Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010; 7(9): e1000326. PubMed Abstract | Publisher Full Text | Free Full Text
2. Bastian H: Pubmed Commons. 2013. Reference Source
3. Plavén-Sigray P, Matheson GJ, Schiffler BC, et al.: The readability of scientific texts is decreasing over time. eLife. 2017; 6: pii: e27725. PubMed Abstract | Publisher Full Text | Free Full Text
4. Fradley MG, Viganego F, Kip K, et al.: Rates and risk of arrhythmias in cancer survivors with chemotherapy-induced cardiomyopathy compared with patients with other cardiomyopathies. Open Heart. 2017; 4(2): e000701. PubMed Abstract | Publisher Full Text | Free Full Text
5. Bornmann L, Marx W: The journal Impact Factor and alternative metrics: A variety of bibliometric measures has been developed to supplant the Impact Factor to better assess the impact of individual research papers. EMBO Rep. 2016; 17(8): 1094–1097. PubMed Abstract | Publisher Full Text | Free Full Text
6. Bornmann L, Mutz R, Daniel HD: Gender differences in grant peer review: A meta-analysis. J Informetr. 2007; 1(3): 226–238. Publisher Full Text
7. Cole TJ: Too many digits: the presentation of numerical data. Arch Dis Child. 2015; 100(7): 608–609. PubMed Abstract | Publisher Full Text | Free Full Text
8. Altman DG, Simera I: A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR network. J R Soc Med. 2016; 109(2): 67–77. PubMed Abstract | Publisher Full Text
9. Style Manual: For Authors, Editors and Printers. John Wiley and Sons Australia, 6th edition, 2002. Reference Source
10. The Economist: The Economist Style Guide: 9th Edition. Bloomberg Press, 2005, ISBN 1861979169. Reference Source
11. Cole TJ: Setting number of decimal places for reporting risk ratios: rule of four. BMJ. 2015; 350: h1845. PubMed Abstract | Publisher Full Text
12. Winter D: rentrez: Entrez in R. R package version 1.1.0. 2017. Reference Source
13. Gelman A, Carlin JB, Stern HS, et al.: Bayesian Data Analysis. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2nd edition, 2003, ISBN 9781420057294. Reference Source
14. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2017. Reference Source
15. Borwein J, Bailey DH: The Reinhart-Rogoff error – or how not to Excel at economics. The Conversation. 2013. Reference Source
16. Baker M: Stat-checking software stirs up psychology. Nature. 2016; 540(7631): 151–152. PubMed Abstract | Publisher Full Text
17. Perkel JM: Scientific writing: the online cooperative. Nature. 2014; 514(7520): 127–128. PubMed Abstract | Publisher Full Text
18. Barnett A: agbarnett/decimal.places: First release of decimal place code and data (Version v1.0). Zenodo. 2018. Data Source

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 11 Apr 2018

Author details Author details

Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, Queensland, 4059, Australia

Adrian G Barnett
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Competing interests

No competing interests were disclosed.

Grant information

AB receives funding from the Australian National Health and Medical Research Council (APP1117784).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (3)

version 3

Revised

Published: 10 Aug 2018, 7:450

https://doi.org/10.12688/f1000research.14488.3

version 2

Revised

Published: 23 Jul 2018, 7:450

https://doi.org/10.12688/f1000research.14488.2

version 1

Published: 11 Apr 2018, 7:450

https://doi.org/10.12688/f1000research.14488.1

© 2018 Barnett AG. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Barnett AG. Missing the point: are journals using the ideal number of decimal places? [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2018, 7:450 (https://doi.org/10.12688/f1000research.14488.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 11 Apr 2018

Views

Reviewer Report 26 Jun 2018

David J. Winter, Massey University, Palmerston North, New Zealand

Approved with Reservations

https://doi.org/10.5256/f1000research.15773.r33075

In this article, Barnett examines the precision with which percentages are reported in the abstracts of journals indexed in PubMed. The article does a good job of explaining the motivation and design of the research and putting its results into context. The data presented in the manuscript supports the major finding of the work (that percentages are frequently presented with more apparent-precision than is reasonable).

Because my own expertise relates more to the code used for this analysis than the proper presentation of statistics my review focuses on the scripts use to perform the analysis. The linked code repository is currently lacking a number of files required to perform the analysis. I am sure this is simply and oversight, but I cannot recommend this paper for acceptance until these files are included. In addition, the overall design of the analysis code, which contains hundreds of lines of R code without any user-defined functions and no "high-level" documentation of different sections of the scripts, makes it difficult to sport errors or apply this approach to other datasets. I detail these concerns below.

Major issues with code

The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' are required to perform these analyses but not included in the repository.

Many of the datasets produced by this analysis are only saved in binary formats (excel spreadsheets or serialized R objects). Key data underlying the results presented in the paper should be made available in a plain-text format (e.g. csv for the tables).

The README file for the github repository should contain:
(a) brief description of the most important files included in the repository and their purpose (e.g. make.data.R, decimal.places.stats.Rmd)
(b) A list of R packages required to run this code
(c) Instructions on how a user can run these analyses (perhaps pointing out any steps that take a particularly long time, and how they might be skipped)
(d) Links to this paper and the Zenodo archive associated with this repository.

The file "make.data.R" would benefit from some high-level documentation explaining the motivations for each section of the file, and precisely what is being created by each code block. At present, the comments and the top of each block are very brief and difficult to interpret. For example, the comment # get meta data (loop through smaller numbers)" (appearing on line 101) has no obvious meaning to me.

Minor issues with code

I suggest you avoid using the aliases T and F for TRUE and FALSE. These are not reserved variables, and accidentally setting them to some other value can lead to unexpected results.

The code often uses an idiom like

already.saved = T if(already.saved==F) { #do something }

I am not sure what purpose this serves, as the value is always hard-coded to TRUE. If the intend is to save users from re-running the data-fetching step it may make more sense to save the data to csv, then check for the existence of the data file before fetching fresh data.

The variable "journals2" is defined by never used.

Line 146 has "for (a in 1:9000)", hard-coding the number of articles to 9000 (slightly fewer than the number included in the study).

A lot of typing could be saved from variables like "ci.phrases" if abstracts where always converted to lower case before matching strings. This would both increase the readability of the code and make it less likely that typographical errors slip into the code.

Minor issues with the manuscript.

The word "accuracy" is often used to describe the representation of percentages. I think "precision" is a better term for what is being described.

In the introduction, the sentence starting "Papers have been criticised for using too many decimal places..."
does not cite any criticism.

Rentrez now has a paper to cite
Winter, D. J. (2017) rentrez: an R package for the NCBI eUtils API¹.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

References

1. Winter DJ: rentrez: an R package for the NCBI eUtils API. The R Journal. 2017; 9 (2): 520-526

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 23 Jul 2018

Adrian Barnett, Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, 4059, Australia

23 Jul 2018

Author Response
Thanks for your useful comments on the code and documentation thereof.
- The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.
- All the key RData files
... Continue reading
Thanks for your useful comments on the code and documentation thereof.

The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.

All the key RData files have now also been provided as CSV or tab-delimited files.

The README file in github is more detailed.

I’ve added sections to the make.data.R code and improved the comments.

I've used TRUE/FALSE in place of T/F

As suggested, I’ve now checked for the existence of the RData files.

The variable "journals2" was used as a time saver. The many data were created for two sets of journals (‘journals’ and ‘journals2’).

The loop up to 9000 was added because there were sometimes breaks in the online connection to pubmed. I’ve used the full loop and added a warning in the code about this issue.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.
Thanks for your useful comments on the code and documentation thereof.

The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.

All the key RData files have now also been provided as CSV or tab-delimited files.

The README file in github is more detailed.

I’ve added sections to the make.data.R code and improved the comments.

I've used TRUE/FALSE in place of T/F

As suggested, I’ve now checked for the existence of the RData files.

The variable "journals2" was used as a time saver. The many data were created for two sets of journals (‘journals’ and ‘journals2’).

The loop up to 9000 was added because there were sometimes breaks in the online connection to pubmed. I’ve used the full loop and added a warning in the code about this issue.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 23 Jul 2018

Adrian Barnett, Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, 4059, Australia

23 Jul 2018

Author Response
Thanks for your useful comments on the code and documentation thereof.
- The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.
- All the key RData files
... Continue reading
Thanks for your useful comments on the code and documentation thereof.

The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.

All the key RData files have now also been provided as CSV or tab-delimited files.

The README file in github is more detailed.

I’ve added sections to the make.data.R code and improved the comments.

I've used TRUE/FALSE in place of T/F

As suggested, I’ve now checked for the existence of the RData files.

The variable "journals2" was used as a time saver. The many data were created for two sets of journals (‘journals’ and ‘journals2’).

The loop up to 9000 was added because there were sometimes breaks in the online connection to pubmed. I’ve used the full loop and added a warning in the code about this issue.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.
Thanks for your useful comments on the code and documentation thereof.

The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.

All the key RData files have now also been provided as CSV or tab-delimited files.

The README file in github is more detailed.

I’ve added sections to the make.data.R code and improved the comments.

I've used TRUE/FALSE in place of T/F

As suggested, I’ve now checked for the existence of the RData files.

The variable "journals2" was used as a time saver. The many data were created for two sets of journals (‘journals’ and ‘journals2’).

The loop up to 9000 was added because there were sometimes breaks in the online connection to pubmed. I’ve used the full loop and added a warning in the code about this issue.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 23 Apr 2018

Tim J. Cole, UCL Great Ormond Street Institute of Child Health, London, UK

Approved

https://doi.org/10.5256/f1000research.15773.r33072

Barnett's article tests the adherence of papers in 23 journals to Cole's guidelines on the numerical presentation of percentages. As the author of the guidelines I acknowledge a competing interest. The paper is well designed, executed and reported, and it has a clear message - many papers give percentages to excessive precision. I have a few minor suggestions for improving it.

1. The term "accuracy" used several times in the Introduction and elsewhere should more correctly be "precision".

2. Did the algorithm for identifying percents allow for the possibility of the % appearing preceded by a space?

3. Looking at the R code I note the long lists of alternative spellings. First converting all text to lower case would have substantially simplified this process.

4. The spikes in Figure 1 are interesting. The author suggests they represent digit preference, which may be true, but I would have thought it worth formally checking some of them to see.

5. The paper starts with an apt quotation from Einstein. I wanted to draw the author’s attention to another apt quotation, from Gauss, which appears as a response to my original Archives of Disease in Childhood paper here. It is "Lack of mathematical education does not become more evident than by excessive precision in numerical calculation." Carl Friedrich Gauss (1777-1855).

Tim Cole

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: The paper explores how widely my guidelines for presenting percents in papers are adhered to.

Reviewer Expertise: Medical statistics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response 23 Jul 2018

Adrian Barnett, Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, 4059, Australia

23 Jul 2018

Author Response
Thanks for your useful comments on the paper. My answers to your numbered questions are:
1. Agreed and changed in the Introduction and Discussion.
2. Yes, a preceding space
... Continue reading
Thanks for your useful comments on the paper. My answers to your numbered questions are:

Agreed and changed in the Introduction and Discussion.

Yes, a preceding space was allowed. I have now added this to the Methods.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.

This comment was based on my informal checks made whilst compiling the data. As a more formal check I examined fifty randomly selected percents of exactly 50 and found that only 20 were actual results, with the rest being rounded results or thresholds. I’ve added some additional text to the paper on this.

Thank you for this quote. Excessive decimal places have been a problem for some time and this paper will hopefully help to draw attention to it again.
Thanks for your useful comments on the paper. My answers to your numbered questions are:

Agreed and changed in the Introduction and Discussion.

Yes, a preceding space was allowed. I have now added this to the Methods.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.

This comment was based on my informal checks made whilst compiling the data. As a more formal check I examined fifty randomly selected percents of exactly 50 and found that only 20 were actual results, with the rest being rounded results or thresholds. I’ve added some additional text to the paper on this.

Thank you for this quote. Excessive decimal places have been a problem for some time and this paper will hopefully help to draw attention to it again.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 23 Jul 2018

Adrian Barnett, Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, 4059, Australia

23 Jul 2018

Author Response
Thanks for your useful comments on the paper. My answers to your numbered questions are:
1. Agreed and changed in the Introduction and Discussion.
2. Yes, a preceding space
... Continue reading
Thanks for your useful comments on the paper. My answers to your numbered questions are:

Agreed and changed in the Introduction and Discussion.

Yes, a preceding space was allowed. I have now added this to the Methods.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.

This comment was based on my informal checks made whilst compiling the data. As a more formal check I examined fifty randomly selected percents of exactly 50 and found that only 20 were actual results, with the rest being rounded results or thresholds. I’ve added some additional text to the paper on this.

Thank you for this quote. Excessive decimal places have been a problem for some time and this paper will hopefully help to draw attention to it again.
Thanks for your useful comments on the paper. My answers to your numbered questions are:

Agreed and changed in the Introduction and Discussion.

Yes, a preceding space was allowed. I have now added this to the Methods.

I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.

This comment was based on my informal checks made whilst compiling the data. As a more formal check I examined fifty randomly selected percents of exactly 50 and found that only 20 were actual results, with the rest being rounded results or thresholds. I’ve added some additional text to the paper on this.

Thank you for this quote. Excessive decimal places have been a problem for some time and this paper will hopefully help to draw attention to it again.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 11 Apr 2018

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 3 (revision) 10 Aug 18
Version 2 (revision) 23 Jul 18		read
Version 1 11 Apr 18	read	read

Tim J. Cole, UCL Great Ormond Street Institute of Child Health, London, UK
David J. Winter, Massey University, Palmerston North, New Zealand

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

12 Views

01 Aug 2018 | for Version 2

David J. Winter, Massey University, Palmerston North, New Zealand

12 Views Cite this report Responses(0)

Approved

This revision addresses the major issues associated with the code and manuscript, and the paper should thus be accepted. However, two minor issues associated with the manuscript have not been dealt with, and the author's reply to reviews does not explain this decision. To reiterate:

The statement in the third paragraph of the introduction starting "Papers have been criticised for using too many decimal places..." is still not supported by a citation (the cited paper is an example of a paper using high precision despite a small sample size, not a criticism of this practice). It should be simple to re-word the sentence or, if papers have been criticized for this practice, cite examples.
it would be better to cite the paper describing rentrez, rather than the CRAN repository

These issues may seem trivial, and they should not hold up the acceptance of this manuscript, but I think they should be addressed in a final revision.

References

1. Winter DJ: rentrez: an R package for the NCBI eUtils API. The R Journal. 2017; 9 (2): 520-526

Competing Interests

I am the author of the "rentrez" package used in this analysis and discussed in my review.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

33 Views

26 Jun 2018 | for Version 1

David J. Winter, Massey University, Palmerston North, New Zealand

33 Views Cite this report Responses(1)

Approved With Reservations

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

References

1. Winter DJ: rentrez: an R package for the NCBI eUtils API. The R Journal. 2017; 9 (2): 520-526

Competing Interests

No competing interests were disclosed.

Respond to this report

Responses (1)

Author Response

23 Jul 2018

Adrian Barnett, Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, 4059, Australia

Thanks for your useful comments on the code and documentation thereof.

The files 'decimalplaces.R' and 'MultinomialCIsBayes.R' have now been added to github.
All the key RData files have now also been provided as CSV or tab-delimited files.
The README file in github is more detailed.
I’ve added sections to the make.data.R code and improved the comments.
I've used TRUE/FALSE in place of T/F
As suggested, I’ve now checked for the existence of the RData files.
The variable "journals2" was used as a time saver. The many data were created for two sets of journals (‘journals’ and ‘journals2’).
The loop up to 9000 was added because there were sometimes breaks in the online connection to pubmed. I’ve used the full loop and added a warning in the code about this issue.
I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

27 Views

23 Apr 2018 | for Version 1

Tim J. Cole, UCL Great Ormond Street Institute of Child Health, London, UK

27 Views Cite this report Responses(1)

Approved

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

The paper explores how widely my guidelines for presenting percents in papers are adhered to.

Reviewer Expertise

Medical statistics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Author Response

23 Jul 2018

Adrian Barnett, Institute of Health and Biomedical Innovation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, 4059, Australia

Thanks for your useful comments on the paper. My answers to your numbered questions are:

Agreed and changed in the Introduction and Discussion.
Yes, a preceding space was allowed. I have now added this to the Methods.
I kept the text in upper case because I originally also wanted to find sentences and used the pattern of: full-stop then space then upper-case letter, to define a new sentence. Changing to lower case included some non-sentence breaks for odd character strings such as compounds or labels from genomics. The idea was to compare percentages in the same sentence, but I did not use this in the paper.
This comment was based on my informal checks made whilst compiling the data. As a more formal check I examined fifty randomly selected percents of exactly 50 and found that only 20 were actual results, with the rest being rounded results or thresholds. I’ve added some additional text to the paper on this.
Thank you for this quote. Excessive decimal places have been a problem for some time and this paper will hopefully help to draw attention to it again.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Bastian H, Glasziou P, Chalmers I: Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010; 7(9): e1000326. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Bastian H: Pubmed Commons. 2013. Reference Source

[3] 3. Plavén-Sigray P, Matheson GJ, Schiffler BC, et al.: The readability of scientific texts is decreasing over time. eLife. 2017; 6: pii: e27725. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Fradley MG, Viganego F, Kip K, et al.: Rates and risk of arrhythmias in cancer survivors with chemotherapy-induced cardiomyopathy compared with patients with other cardiomyopathies. Open Heart. 2017; 4(2): e000701. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Bornmann L, Marx W: The journal Impact Factor and alternative metrics: A variety of bibliometric measures has been developed to supplant the Impact Factor to better assess the impact of individual research papers. EMBO Rep. 2016; 17(8): 1094–1097. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Bornmann L, Mutz R, Daniel HD: Gender differences in grant peer review: A meta-analysis. J Informetr. 2007; 1(3): 226–238. Publisher Full Text

[7] 7. Cole TJ: Too many digits: the presentation of numerical data. Arch Dis Child. 2015; 100(7): 608–609. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Altman DG, Simera I: A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR network. J R Soc Med. 2016; 109(2): 67–77. PubMed Abstract | Publisher Full Text

[9] 9. Style Manual: For Authors, Editors and Printers. John Wiley and Sons Australia, 6th edition, 2002. Reference Source

[10] 10. The Economist: The Economist Style Guide: 9th Edition. Bloomberg Press, 2005, ISBN 1861979169. Reference Source

[11] 11. Cole TJ: Setting number of decimal places for reporting risk ratios: rule of four. BMJ. 2015; 350: h1845. PubMed Abstract | Publisher Full Text

[12] 12. Winter D: rentrez: Entrez in R. R package version 1.1.0. 2017. Reference Source

[13] 13. Gelman A, Carlin JB, Stern HS, et al.: Bayesian Data Analysis. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2nd edition, 2003, ISBN 9781420057294. Reference Source

[14] 14. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2017. Reference Source

[15] 15. Borwein J, Bailey DH: The Reinhart-Rogoff error – or how not to Excel at economics. The Conversation. 2013. Reference Source

[16] 16. Baker M: Stat-checking software stirs up psychology. Nature. 2016; 540(7631): 151–152. PubMed Abstract | Publisher Full Text

[17] 17. Perkel JM: Scientific writing: the online cooperative. Nature. 2014; 514(7520): 127–128. PubMed Abstract | Publisher Full Text

[18] 18. Barnett A: agbarnett/decimal.places: First release of decimal place code and data (Version v1.0). Zenodo. 2018. Data Source

Missing the point: are journals using the ideal number of decimal places?

Abstract

Keywords

Introduction

Methods

Data extraction

Figure 1. Flow diagram of included papers.

Statistical analysis

Results

Figure 2. Histogram of all percents between 0% and 100% in 1% bins.

Table 1. Percent and number of percents in abstracts where the guidelines on decimal places were used or not.

Results by journal

Figure 3. Scatter plot of the percent of times journals used too few and too many decimal places with 95% confidence intervals (horizontal and vertical lines).

Discussion

Conclusions

Data availability

Competing interests

Grant information

Supplementary material

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated