‘The great publication race’ vs ‘abandon paper counting’: Benchmarking ECR publication and co-authorship rates over past 50 years to inform research evaluation [version 1; peer review: awaiting peer review]

Background: Publication and co-authorship rates have been increasing over decades. In response, calls are being made to restrict the number of publications included in research evaluations. Yet there is little evidence to guide publication expectations and inform research evaluation for early career researchers (ECRs). Methods: Here we examine the early career publication and coauthorship records between 1970 and 2019 of >140,000 authors of 2.8 million publications, to identify how publication and co-authorship rates have changed over the last 50 years. This examination is conducted in order to develop benchmarks of median publication rates for sensibly evaluating ECR research productivity, and to explore success in meeting these benchmarks with different co-authorship strategies using regression models. Results: Publication rates of multidisciplinary ECRs publishing in Nature, Science and PNAS have increased by 46% over the last 50 years and that publications rates in a set of disciplinary journals have increased by 105%. Co-authorship rates have increased even more, particularly for the multidisciplinary sample which now has 572% more co-authors per publication. Benchmarks based on median publication rates for all authors increased from one publication per year at the start of a career, to four publications per year after 10 years of publishing, and one first-author publication across all years. The probability of meeting these benchmarks increases when authors publish with different co-authors, and first authorship rates decrease for ECRs with many co-authors per publication. Open Peer Review Approval Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 17 F1000Research 2022, 11:95 Last updated: 16 FEB 2022


Introduction
Publication metrics are commonly used for managing academic expectations and evaluating research performance, both by researchers and the institutions they work for. 1 Despite widespread concerns about their use and abuse, 1-3 publication counts, citations and impact factors are still commonly used to guide academic hiring and promotion decisions, and to allocate funding resources. [4][5][6] Yet there is increasing concern that a focus on such metrics has negative consequences, both for science and the researchers themselves. An unbalanced focus on volume may come at the expense of other important aspects of scientific endeavour, such as quality, engagement, and impact. A reward system that incentivises publication quantity can reduce research quality 7,8 as the focus shifts to researcher status rather than knowledge gain, 1 leading to "unsustainable science". 9 In response, calls are being made to "abandon paper counting", 3 limit the number of publications used in research performance evaluations 3 and even limit the number of papers published by researchers. 10 Yet this debate about increasing publication rates, and an appropriate response by researchers and those who evaluate them, has largely ignored early career researchers (ECRs).
There is potential that this unsustainable increase in publication rates could be amplified amongst ECRs, particularly as commonly used publication metrics do not fairly describe their performance; forcing ECRs to focus on publication counting. For example, the widely-used h-index is poorly suited to evaluating early-career research [11][12][13] and it can take many years for high quality research to be reflected in an increase in an author's h-index. Predictors of academic 'success' are often difficult for ECRs to control; publishing earlier, [14][15][16] working at a more prestigious institution, 5 or publishing with top scientists 17 are outside of the control of most ECRs. In the context of uncapped publication expectations, the primary goal for many ECRs is to publish as many papers as possible: "Four papers are better than three. And five are better than four". 18 However, favouring quantity of publications is a potentially maladaptive behaviour. It is a poor predictor of later research 'quality' as measured by indicators such as citation rates. 19 Research evaluation that defaults to publication counting will also disadvantage ECRs with families, those who teach, those who engage with industry or the broader public, or those who work in smaller labs; as they balance producing publications with a variety of competing responsibilities. 20,21 There is currently little empirical data to make sense of these debates for ECRs, or for the research institutions and grant assessors evaluating their research performance. We know that ECRs are publishing earlier compared with ECRs from previous decades, often before their PhD is complete. 22 Yet it is not clear that the pressure to publish is actually leading to increased publication rates for current ECRs, compared with previous generations. Publication strategies that could be potentially useful for ECRs and those who evaluate their research performance, have yet to be tested for ECRs. For example, strategies such as publishing with a large number of co-authors 14,23 or in a range of different titles. 23,24 Calls to severely restrict publishing or the number of publications used in research evaluation 3,10 may diminish the pressure to publish, but could be counterproductive if applied to ECRs who are busily developing networks and skills through publications.
A research evaluation framework that is specific to the early years of a career is needed. This should be based on an understanding of how ECR publication and co-authorship rates have changed through time, how they progress during a career, and what strategies best support research 'success'. It should remove the pressures generated by uncapped publication expectations, and at the same time it should not unnecessarily restrict publication activity, which could inhibit collaboration and skill development. It should recognise the changing patterns of co-authorship, from large team-based publications to sole authorship. This study aims to generate empirical data to support the development of a research evaluation framework suitable for ECRs by: 1) Characterising how publication rates for ECRs have changed over the last 50 years.
2) Developing evidence-based benchmarks for ECR publication rates as careers develop.
3) Exploring the effect of co-authorship strategies on the probability of researchers meeting these benchmarks.

Author samples
Two author samples were selected. The first (multidisciplinary) sample was selected to gain insights into publishing and co-authorship trends between 1970-2019 in the broader field of science, particularly high-ranking international journals that are so often used as an indicator of career success. This included all authors who published in the leading, multidisciplinary scientific journals Nature, Science and PNAS in 2019. A second (disciplinary) sample was selected to explore publication and co-authorship rates of ECRs who routinely publish in discipline-specific journals. This sample was also to compare whether trends observed in the publication rates of leading scientists, are also seen in a cohort of researchers exclusively publishing in more specialised journalsperhaps a more familiar experience for most ECRs. The disciplinary sample included all authors who in 2019 published in any journal that the authors of this article had published in at least twice, including a wide range of ecology, interdisciplinary landscape, and environmental psychology journals (n=21, Table 1).

Data collection
The ECR publication history of these two author samples were collected using the Scopus API, accessed using the rscopus package in R v3.6 in January 2020. A list of all 2019 publications with a Scopus type of 'Article' or 'Review' (and 'Letter' in Nature) were retrieved for each journal of interest using the scopus_search function based on the ISSN of each publication. 45 All authors for each article were retrieved.
There is no consensus on the definition of 'early career'. Previous studies have defined ECRs as researchers within five to twelve years of their first year of publishing. 16,25 In a study of research productivity by career stage, ECRs had an average of 4.5 year's experience post-PhD, compared with 14.1 years for mid-career faculty members. 26 In this study we defined 'early career' as the first ten years of a career to encompass a broad range of definitionssubsets of the data presented here can be applied to shorter definitions of ECR career length.
'Career age' or 'academic age' are often defined as the number of years since first publication, 11,13 or years since PhD. 27 However, consistent with other studies, 28 we observed that many researchers in our author cohorts had gaps in their early years of publishing, possibly due to career gaps [e.g. due to parental leave, illness or caring responsibilities] or breaks between completing honours or masters degrees and beginning PhD studies along with the increasing prevalence of publishing before completing a PhD. 22 As early career publishing rates are sensitive to these gaps, we controlled for them by calculating the career age as the number of 'active' years of publishing, i.e., excluding years with no publications. While recognising that continuous publishing is an important predictor of academic excellence, 28 excluding these career gaps better reflects the publishing experience of most ECRs. The early career publication history (i.e. the first ten active years) of n=55,332 authors for the multidisciplinary sample and n=85,793 authors for the disciplinary sample was retrieved using the author_retrieval function of the rscopus package. Details of articles including publication date, publication title, and co-authors, and author details including institution, city, and country were extracted. Erratum and corrigendum were removed from the analyses. For each published article (n= 1,195,246 articles for the multidisciplinary sample and n= 1,597,069 articles for the disciplinary sample), the author position, and number of co-authors were determined.

Analyses
For each active year of publication for each author, the number of publications, number of first author publications, total number of unique co-authors, and the average number of co-authors per publication was calculated. Adopting the approach of Calver et al., we drew on ecological statistics, using the Jaccard index to calculate co-author dissimilarity across publications using the vegdist function in the vegan package in R. 29 This measure of dissimilarity will approach 1 when co-authors vary across publications, and approach 0 when co-authors are similar across all publications, in any given year.
A generalised linear model using a Poisson distribution was used to predict publication rates (total number of publications, and number of first author publications) and a log model was used to predict co-authorship rates (average co-authors per publication) and a Gaussian distribution to predict co-author dissimilarity across publications, from the first year of publication and career age.
In the context of increasing concern over the focus on quantity over quality in academic publishing and research evaluation, 3 we explored alternative metrics that could provide benchmarks for ECR publication rates using the multidisciplinary cohort. We calculated the median and mean number of publications for all multidisciplinary authors for each active career year, and the median number of first author publications of all multidisciplinary authors across all active years. We also provide comparison publication rates for authors from different co-authorship environments: authors in the top quartile of average number of co-authors over the last decade and for authors from the bottom quartile of average number of co-authors over the last decade.
Based on these data, we determined measures of publishing success that limit publication expectations using three metrics: did the author publish at least the: 1) median, 2) mean number of publications for their career age in any given year, and 3) did the author publish at least the median number of first-author publications in any given year? A set of logistic models was developed to predict the probability of meeting these metrics from the average number of co-authors per publication (log transformed), and the dissimilarity of authors across publications (as measured by the Jaccard index).

Results
Are ECR publication rates changing over time? ECR publication rates have increased over the last 50 years, particularly in our disciplinary sample which is now publishing at higher rates than authors in Nature, Science and PNAS. Multidisciplinary ECR publication rates have increased 46% between 1970 and 2019 ( Figure 1A). Current ECRs are publishing an average of 3.5 publications/year after five active years, compared with 2.4 publications/year after five years for ECRs in 1970 ( Figure 1A). This increase is dwarfed by the increases in publication rates that occur over the first ten years of a career. Current ECRs publication rates increase threefold over the first ten years of publishing, from 2.1 publications/year in the first year of publishing to 6.6 publications/year in the tenth year of publishing. There were contrasting findings for first author publication rates, which have declined 60% over the last 50 years. After five active years of publishing, authors in 1970 were publishing 1.5 first author publications per year, while in 2019 this figure was 0.6 first author publications per year ( Figure 1B). There was a smaller increase in first author publication rates over the first ten years of a career, which increase 31% between the first and tenth active year of publishing for current ECRs ( Figure 1D).
The increase in publication rates over time was much more dramatic in our disciplinary sample, increasing 105% between 1970 and 2019, from rates that were substantially lower than the multidisciplinary sample to rates that are now higher. After five active years of publishing, rates had increased from 1.7 publications/year in 1970 to 3.6 publications/year in 2019 and from 3.4 to 7.1 publications/year after ten years ( Figure 1D). There was a smaller decline of 28% in first author publications per year, from 1.3 in 1970 to 0.9 in 2019 after ten years of publishing.
There were substantial increases in average co-authorship rates over the last 50 years, particularly amongst multidisciplinary ECRs (Figure 2A), who are now publishing with 572% more co-authors per publication compared with 50 years ago. The average number of co-authors per publication in the tenth career year increased from an average of 2.9 authors per publication in 1970 to 16.6 in 2019. In our disciplinary sample, this increased 230% from 2.9 co-authors per publication in 1970 to 9.5 co-authors per publication in 2019 in the tenth career year. There were also increases in co-authorship rates over the first ten years of publishing of 61% in the multidisciplinary sample and 49% in the disciplinary sample. Dissimilarity of authors across publications has increased slightly from 1970 to 2019 in the multidisciplinary and disciplinary samples. However, dissimilarity increased substantially over the first ten years of a career in both samples ( Figure 2B and Figure 2D).
Our new measure of active year (years since first publication, excluding years with no publications) was similar to career age (years since first publication; Figure 3A). One quarter of authors had no publication gaps in their first ten years of publishing, while 58% of authors had three or fewer years with no publications ( Figure 3B). While the differences between active year and career age may be relatively small, active years should more accurately describe the opportunity of ECRs who have had career interruptions or a break between publishing from honours or master's studies, than career age or number of years since PhD completion.

Evidence-based benchmarks for ECR publication rates
Our selected benchmarks for evaluation were the career-age controlled median and mean publication rates of 55,332 authors publishing in Nature, Science and PNAS in 2019 (Table 2, Figure 4), and publishing at least the median number of first-author publications per year relative to their cohort. Median publication rates increased from one publication/year in the first active year of publishing, to two in the fifth year of publishing to four in the tenth year of publishing (Table 2). Mean publication rates increased due to a number of authors with substantially higher publication rates (as seen by the outliers in Figure 4), from 1.6 publications/year in the first active year of publishing, to 3.3 in the fifth year of publishing to 5.2 in the tenth year of publishing (Table 2). Median and mean publication rates were somewhat higher (34%) for authors with many co-authors, and slightly lower (13%) for authors with few co-authors. The median first author publication rate was one first author publication/year in the first ten years of publishing. In any given year 71% of multidisciplinary authors met the median publication rate benchmark, 33% met the mean publication rate benchmark, and 53% met the first author publication benchmark. The effect of co-authorship strategies on the probability of researchers meeting these benchmarks The likelihood of meeting these benchmarks was higher for ECRs whose co-authors vary across publications each year ( Figure 5, Table 3) as measured by co-author dissimilarity (Jaccard index). This is not surprising given that publishing with different authors necessarily means publishing more. Increasing the number of co-authors per publication also had positive effect (although smaller than author dissimilarity) on the likelihood of achieving median and mean publication rates ( Figure 5A, C, Table 3). However, publishing with a greater number of co-authors per publication greatly decreased the probability of having a first-author publications ( Figure 5F, Table 3).

Discussion
Current ECRs have higher rates of publication than ECRs from earlier decades. This is true for scientists publishing in leading multidisciplinary journals, and much more so for a cohort of ECRs publishing in disciplinary journals spanning ecology, interdisciplinary landscape, and environmental psychology. In the context of these findings, and heeding the call to "abandon paper counting", 3 we present three evidence-based benchmarks of research performance for ECRs: the median and mean yearly publication rates, and median yearly first-author publication rates, relative to their disciplinary cohort. The best predictor of meeting these publication benchmarks was publishing with co-authors that vary from  publication to publication. Perhaps surprisingly, having more co-authors per publication was negatively related to meeting the first-authorship benchmark.

Changes in ECR publication rates over time
The acceleration in publication rates observed in our study is unlikely to be sustainable. While there are undoubtedly ECRs who publish at the higher rates observed over the last decade while maintaining research quality and impact (and quality of life), this is unlikely to be true for many researchers. In order to develop a competitive track record, ECRs are often advised to work long hours, avoid activities that aren't directly linked to career advancement, 30 and write without the distractions of work colleagues, friends and families. 31,32 These effects may be compounded by short-term contracts which demand continual output at the expense of "human and social capital accumulation", and can lead to career termination due to unavoidable fluctuations in output. 33 An unbalanced focus on publication rates risks compromising other aspects of scientific endeavour and academic development, and it may lead to declines in research quality and impact, compromise work-life balance and mental health, or reduce diversity as disadvantaged researchers slip from the system. 18,30,34 Interestingly, rates of first authorship are decreasing, particularly in the multidisciplinary author group. This seems a natural corollary to the even greater increases observed in co-authorship rates over the last 50 years, as has previously been well demonstrated for established researchers, 35 and may also be partly offset by increases in joint first-authorship publications. 36 While the average number of co-authors increases slightly over the first ten publishing years of an academic career, there has been a substantial increase over the last 50 years for all ECRs. This demonstrates a substantial change in publishing patterns, away from individual and small group collaborations to larger team collaborations. Relatively stable patterns in the dissimilarity of co-authors across publications highlight that this shift is to relatively stable larger teams of authors rather than larger individual networks of co-authors.
Concerningly, the real size of the observed acceleration in publication rates is likely to be greater than described here. Our data is likely to be oversampling the most successful established scientists, as ECR's from earlier decades who ceased publishing before 2019 are not included. This is consistent with previous research which has identified the attrition of less-productive researchers as a factor in assessing publication rates over time. 27 Evidence-based benchmarks for ECR publication rates There is a clear need for evaluation strategies that are tailored to variations in publication rates over the early years of a career, and do not further encourage unsustainable rates of publicationthat avoid assuming ever increasing quantities of  publications are better. 18 We have proposed a suite of evidence-based benchmarks with which to set ECR publishing expectations, and to evaluate ECR publication performance. These benchmarks incorporate career age, ensuring that they more appropriately and fairly assess performance of those early in their career. A suite of metrics allows for a more comprehensive view of research productivity (e.g., volume, collaboration, and research leadership), while still permitting 'outstanding' performance to be identified. For example, there is a substantial difference between median and mean publication rates as a result of enormous variation in the output of individual researchers, with some ECRs in this sample publishing more than 100 papers in some years. This difference is potentially useful for evaluating ECR productivity as the proportion of researchers meeting mean publication rate benchmarks was half the rate of those meeting median publication rates. This allows some discrimination between ECRs based on publication productivity, while still limiting the expectations on the numbers of publications used in evaluations. ECRs and institutions could use these to set expectations for sensible rates of publishing or to choose the number of publications to use in research evaluation. 3 Care must be taken in the use of this evidence base. It does not control for part-time work and the fraction of employment devoted research. While it does control for career gaps spanning calendar years, it does not control for shorter career gaps or periods of reduced activity due to parenting, caring responsibilities, health and other pressures that can substantially affect research productivity. It is likely that there is some variation in publication rates across different research areas, and future research may develop subject-specific benchmarks. Further research could explore the role of different kinds of publications such as books that are important in some disciplines. 26 The effect of co-authorship strategies on the probability of researchers meeting these benchmarks The finding that publishing with dissimilar authors across publications is a better predictor of meeting these benchmarks than large numbers of co-authors per paper, is consistent with studies of academic success in established researchers that highlight the benefits of collaboration, and the pitfalls of over-reliance on publications with many authors. 24,33,37,38 Previous studies have shown that publishing with a moderate number of co-authors is associated with higher publication and citation rates. 39 Further afield, international collaboration is associated with increased publication productivity. 40,41 The strategy of developing a broad and varied network of co-authors may also lead to a range of other benefits such as grant writing opportunities, exposure to diverse views and development of professional networks, and consequently job opportunities. Research suggests that opportunities for networking and collaboration can lead to an increased perception of quality of a research environment and improved publication rates for ECRs. 42 More broadly, increased networking can lead to perceived and actual benefits in the workplace, 43 through mechanisms such as increased access to information, resources and mentoring. 44 Table 3. Continued

Study limitations
The multidisciplinary author sample used here have all published in top journals that most researchers do not publish in. Thus, basing benchmarks on the ECR track record of these authors may not be fair to all ECRs. Nonetheless, the comparison with a disciplinary author sample that has largely converged with the publishing rates of these authors provides some confidence that the benchmarks will be valid, at least in some science disciplines. Further research could help to better understand which disciplines these benchmarks are indeed valid for, and which disciplines require alternative benchmarks (or for which publishing benchmarks may not be appropriate).
While Scopus historical data is likely to be limited and may not include all publications of ECRs from the 1970s and 1980s, this limitation is balanced to an extent by our sample only including authors who continue to publish in top journals in 2019-and this group is likely to have higher publication rates than the average ECR from the 1970s and 1980s.

Conclusion
There is growing concern about the effects of the 'publish or perish' mentality on the career prospects and wellbeing of young researchers. 18,20 In response, some authors are arguing that the number of publications included in research evaluations should be limited. 3 Our findings show that current early career researchers are indeed publishing at an accelerated rate compared with their peers from previous decades. This is more pronounced for our disciplinary peers compared to authors in leading multidisciplinary journals. This work is a first step in introducing an evidence base to set publishing expectations and support the current debate around limiting publication rates and the number of publications used in research evaluations for ECRs. Reducing the pressure to always be publishing more, may allow ECRs to focus on the broader societal and institutional responsibilities that come with a career as a researcher, while still demonstrating research excellence. This may encourage ECRs to invest time in other endeavours leading to personal, academic, and societal benefits, such as grant applications, student supervision, collaboration, teaching, communication, outreach, industry engagement and academic service. These are critical pillars of an academic career. Framed in this way, developing collaborative networks with other researchers and research partners becomes a useful strategy for success rather than a distraction from writing more publications. These findings do not diminish the need for qualitative, place and mission specific assessments of research performance, 2 but instead provide a suite of metrics that can more reasonably guide and evaluate the publication performance of early career researchers. This project contains the following underlying data:

Data availability
• pubs_year-metadata-Jan2021.tab • pubs_year-multi.csv • pubs_year-us.csv Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).