Factors associated with producing a scientific publication during medical training: evidence from a cross-sectional study of 40 medical schools in Latin America

Background: Scientific publication during medical training is key to promoting enduring cutting-edge knowledge. The promotion of science among medical students in Latin America is a multisectoral issue that is hampered by the lack of governmental knowledge to invest in national research, as well as by the lack of support from local universities. This study aims to determine the factors associated with the production of a scientific publication during medical training among Latin American medical students of local scientific societies. Methods: This is a secondary data analysis of a cross-sectional study conducted in 2016 that assessed the use of information and communication technologies (ICTs) among medical students from 40 local scientific societies of medical students affiliated with FELSOCEM. Teams from each local scientific society surveyed self-reported scientific publications and explored their association with socioeconomic, academic, and research training conditions. We applied nested models to identify the covariates associated with self-reported scientific publication, obtaining a parsimonious mixed-effects multilevel model grouped by medical scientific society. Results: Of 11,587 participants, the prevalence of scientific publications increased in 36% among medical students affiliated to a Scientific Society of Medical Students [parsimonious prevalence ratio (pPR)=1.36, 95%CI=1.16–1.59], 51% among medical students with advanced English proficiency [pPR=1.51, 95%CI=1.21 – 1.87], 85% among medical students who attended a scientific writing skills course [pPR=1.85, 95%CI=1.59–2.15], 81% among medical students who use Sci-Hub [pPR=1.81, 95%CI=1.50–2.20], and 108% among medical students who have access to a pirated academic account [pPR=2.08, 95%CI=1.83–2.36]. Conclusions: Producing a scientific publication among medical students is associated with being affiliated to a scientific society of medical students, English proficiency, training in scientific writing, use of Sci-Hub, and pirated academic accounts. The results will help clinical educators and medical programs improve resources for training students in high-quality research


Introduction
Producing a scientific publication during medical training is key to promoting continuing medical education and encouraging trainees to create cutting-edge knowledge. In doing so, students will develop research and critical thinking skills and will carry out evidence-based practice and patient-centered care with an enduring vision for pursuing a scientific career [1][2][3] . Latin American universities are progressively recognizing the critical importance of fostering science at the beginning of the bachelor's degree, and are implementing researchoriented courses such as research design methods, biostatistics, epidemiology, and a research-focused thesis 4 . However, there are still gaps in Latin America compared to university research systems in developed countries in terms of number of publications, quality of published articles, dissemination of studies, and funding opportunities 5 . Studies in Colombia and Brazil show that medical students consider scientific research as an important aspect of their training and that the low scientific output is influenced by the lack of inspiring and committed mentors as role models for the beginning of the scientific career 6,7 . Between 1997 and 2010, there was an 8.4% increase in student participation in manuscripts published in journals indexed in Scielo-Peru, of which 42% reported being affiliated with a medical student scientific society 4,8 .
In Peru, the progress of undergraduate medical research has been strongly promoted by the Peruvian Scientific Society of Medical Students (SOCIMEP, by its acronym in Spanish), an organization that has been improving the research training of medical students for 27 years 9 . SOCIMEP is organized in scientific and academic committees and is made up of 38 local scientific societies in all Peruvian medical schools. This society is recognized for the organization of international, national, and local scientific conferences 9 . SOCIMEP also encourages the active participation of societies and integrates them into a nationwide research network, and provides connections to experienced research mentors. Being affiliated to a local scientific society affiliated to SOCIMEP is associated with a higher scientific production (PR: 2.41; 95% CI: 1.55-3.74) 10 . However, only 10% of the projects carried out in local scientific societies are published in indexed journals due to poor methods applied in the studies, lack of knowledge of the editorial process, few local mentors, and lack of financial support from public agencies and institutions 11 . Funding opportunities for medical students are scarce in local medical schools in Peru and in much of Latin America. Overall government investment is disproportionately granted and often contradictory to local public health needs, detracting from the importance of well-implemented laboratories and full-time research-focused faculties 12 . In Peru, less than 30% of universities have funding programs for students to conduct thesis research, or awards for student research programs 13 .
The promotion of science among medical students in Latin America is a multi-sectoral issue that is hampered by governments' lack of knowledge about investing in well-structured national research and innovation systems, as well as by the lack of support from local universities, their lack of investment in research facilities, and the lack of mentors with international research experience 3 . Improvement of the scientific system in Latin America might be valuable for other regions of the world, by promoting high-quality research at the undergraduate-level, an integrative ecosystem of research and education would be consolidated for to better medical practice, and training of health professionals in similar settings across the globe. In this scenario, our study aims to determine the factors associated with scientific publication during medical training, in order to identify the needs of local Latin American scientific societies for the implementation of continuing education programs in research. Our hypothesis is that there are factors in medical training associated with the production of a scientific publication during undergraduate training.
for each research center was 289 medical students, to which 10% was added to allow for dropouts. Thus, we set out to survey 318 medical students at each university. We considered a sample size calculation with 80% power and 5% significance for an infinite sample size. The use of these parameters is a convention to determine a conservative sample size that can detect a minimal difference for the outcome. As for the selection of the participants, the interview team went into the course with the highest credit in each academic year and chose the students who were seated in an odd place per row. In three universities, the sample size was not large enough to reach the minimum required, so we surveyed to all students.

Operational procedures
In 2015, the ICTs project was awarded an amount of money for publication in the multicenter project competition of the 30th International Congress of the Latin American Federation of Scientific Societies of Medical Students (FELSOCEM). This award allowed the authors to contact the researchers of the FELSOCEM international collaboration network for the development of the study. We were able to register teams from 40 out of 69 Scientific Societies of Medical Students (SOCEM) throughout Latin America. Each scientific society had at least one team with three medical students who received training on scientific integrity 16 , standardized methods for survey participants, data entry procedures and quality control of the datasets.
In each medical school, a designated team of interviewers surveyed at the beginning or end of lectures, prioritizing that students had enough time for their comfort. The questionnaire was given to each selected student after explaining the objective of the study and the duration of the survey (approximately 15 minutes). The survey was self-reported, that is, the participants provided the answers themselves. An English translation of the survey is available as Extended data 17 .

Measures
Self-reporting of manuscript publication was analyzed as a binary outcome. Multinomial variables included gender, age, current year of degree, English proficiency, courses in PubMed, courses in Scopus, courses in Scielo, and provision of pirated scholarly accounts. Binary variables included university, affiliation with a medical student scientific society, previously studied career, scientific database courses, scientific writing courses, scientific navigation courses, Zotero courses, Sci-hub usage, and pirated academic account usage. All these variables were self-reported.
Sci-hub usage is defined as the use of the web service to read and download restricted scientific articles that are typically paid or subscription-linked at academic institutions. Use of pirated academic accounts is the use of any account provided by a teacher, student or other person that helps the student find and download articles from academic institutions that subscribe to scientific journals or databases.

Data analysis
The association between self-reporting of manuscript publication and its covariates was assessed using chi-square tests for categorical variables and the Mann-Whitney U test for numerical variables. Poisson family regressions were performed using a log link function and mixed effects multilevel models. Nested models were estimated following a forward manual selection method using likelihood ratio tests. Covariates with significant p-values (p < 0.05) were included in the further nested model until statistical significance was not reached. This method was used to obtain a parsimonious multivariate model, which retains the least amount of covariates to explain the variance of the outcome. Crude and adjusted prevalence ratios (PR) were estimated with 95% confidence intervals (95%CI). All hypotheses were tested with a significance of 5%. The analysis was performed using Stata 15.1. The code is openly available on GitHub and Zenodo 18 .

Ethical considerations
This study was classified as minimal risk for participants by the Institutional Review Board of San Bartolome's Hospital (CIE15325-15), and issued its approval. Trained interviewers obtained verbal consent from participants and provided them with an anonymous self-administered survey. Each survey was assigned a numerical ID to protect the privacy of the participants.

Results
A total of 11,587 medical students completed the survey. The mean age was 21±2.9 years, 53% were female, 12.5% (n=1,449) were affiliated with a medical student scientific society, and 14.1% (n=1,618) reported advanced English language skills. The individual-level responses are available as Underlying data 19 .
Scientific writing courses were attended by 65.1% (n=3,989) of the students, and 7.9% (n=893) had published at least one scientific article during their medical training. Out of 6,632 students, 19.2% (n=1,273) used Sci-Hub at some point in their career (Table 1). There were differences in the prevalence of scientific publications among first-and final-year medical students (4.3% first year vs. 13% final year), membership in a medical student scientific society (12.43% yes vs. 7, 24% no), advanced and elementary English proficiency (11.2% advanced vs. 6.4% elementary), completion of a scientific writing course (14.6% yes vs. 4.3% no), use of Sci-Hub (19.3% yes vs. 4.7% no) and possession of pirated academic accounts (15.3% yes vs. 5.5% no) ( Table 2).
The nested models progressively selected the following covariates: scientific writing courses, pirated academic accounts, universities, Zotero courses, scientific database courses, year of study, previous degree, English proficiency, and medical student scientific society membership. The prevalence of having a scientific publication was 85% (pPR=1.85, 95% CI=1.59-2.15, p<0.001) higher in students who took a scientific  (Table 3). Information about medical schools in Latin America is available as Extended data 20 .

Discussion
Pirated academic accounts and use of Sci-Hub Sci-Hub use was reported by 19.2% (n=1273) of the students surveyed, of whom 19.3% (n=243) published a manuscript during their medical training. Awareness and use of Sci-Hub  may be due to the strong need for access to high-level scientific evidence behind a paywall. This need is often reinforced because many medical schools do not offer access to high quality scientific journals or databases. However, medical students have reported difficulties in accessing Sci-Hub because it is considered an illegal service in many regions, meaning that the web domain is often blocked 21-24 .
Sci-Hub use was associated with a higher prevalence of scientific publication among medical students (PR: 1.81; 95%CI: 1.50-2.20). Students feel a strong need of access to paid articles, leading them to seek free access on Sci-Hub 23,25 . However, even those students who do not face a paywall, found using Sci-Hub reduced the time and increased simplicity of browsing 26 . In addition, many researchers and students identify Sci-Hub as a faster option that is not limited to their institution's catalog 24 . This process of rapid acquisition of scientific articles offered by sci-hub is probably homogeneous among high-and low-income countries around the world 27 . American researchers and foreign-trained postdoctoral researchers face difficulties due to an unfavorable scientific system 29 . For example, the Peruvian administration's investment in the advancement of science and research is still insufficient, at only 0.12% of gross domestic product compared to 0.36% in Chile, 1.3% in Brazil, and 2.8% in the United States 44,45 . This is a concerning situation that must be addressed at the political level to efficiently solve public health needs.

Medical student scientific society membership
Our results showed that membership in a medical student scientific society increased the prevalence of scientific publication by 36% (pPR:1.36, 95%CI=1.16-1.59, p<0.01). Student scientific societies, such as SOCIMEP, attempt to fill the gaps in research training and provide students with the mentors, courses and scientific opportunities to pursue a research career 9,46 . With more than 30 years of operations with local scientific societies throughout Peru, SOCIMEP promotes regional, national and local research events (CUMIS), annual scientific congresses and foundation courses in epidemiology, research design, and biostatistics 47 . SOCIMEP's overall reach was reflected in the 242 articles published by scientific societies, of which 11% (n=67) were published in Q1 journals, under the tutelage of highly experienced national researchers 48 . SOCIMEP's presence in Peru demonstrates the importance of an integrated institution that could not only equalize opportunities for students, but also improve scientific production in the country. Our results suggest that this student research system could be an effective model for other similar contexts.

Limitations
Our results have limitations that are described in the following statements. First, several questionnaire items were self-reported, which may cause outcome misclassification. This means that a participant is classified to the wrong group, e.g., a student who is proficient in English feels unskilled and their response leads them being classified as a non-proficient student. and increase the potential of information bias. However, we tried to control this situation by motivating the students to answer the questionnaire in an honest manner and not to rush them; in this sense, our result is consistent with reality. Second, all 40 medical schools were affiliated with FELSOCEM, which indicates a possible selection bias because this Latin American institution is integrated by medical schools that meet standardized parameters of undergraduate scholarly. Therefore, our results are useful for these schools but should be extended to other similar local and regional realities in different countries. Third, some other factors may be missing to better understand the medical training characteristics that may influence scientific publication. For example, the type of university (private or public), the gross national income devoted to research in each participating country, and the presence of highly qualified researchers in medical schools. However, this study provides relevant information to design new studies addressing the scientific production of medical students.

Conclusion
Factors associated with producing a scientific publication in medical students during their medical training in Latin America are being affiliated to a scientific society of medical students, having an advanced command of English, having attended a scientific writing course, the use of Sci-Hub and the use of pirate accounts. The promotion of science among medical students in Latin America is a multisectoral issue. Its development must be addressed as part of multilevel strategies coming from the highest governmental authorities. In this way, universities would be empowered and a committed scientific system would be built in each nation. This project contains the underlying data in DTA and CSV formats. This project contains an English-language copy of the questionnaire used for data collection. This project contains a list of the medical schools surveyed for this study.

Data availability
Analysis code used in this study is available at: https://github.com/culquichicon/Scientific_writing.

Megan Anakin
Otago Medical School, University of Otago, Dunedin, New Zealand Thank you for this revised version of your article. Overall, the article has been strengthened by addressing the feedback from reviewers. Thank you for addressing all of the comments and suggestion made about the first version of your article. The revisions provide the reader with enhanced descriptions of key terminology and decisions made about the study design and procedures to enable better interpretation of the results. The revisions also provide the reader with a greater understanding of the context of the study and how the problem and findings may be relevant and applicable to their context.

Introduction
Second paragraph, awkward word choice: I'm not certain what 'stands' means in this sentence. Please consider revising this long sentence into two shorter ones. The second sentence could begin: "SOCIMEP holds international,..." The introduction provides a well-argued warrant for the study by established the local need for the study. As a reader from outside Latin America, I am wondering how this situation might be similar and different to other regions. Please consider relating this problem to the locations of readers beyond Latin America.

Methods
The aim is clearly stated at the beginning of this section and the methods presented are appropriate to address it.
Population and sampling section. Please explain the educational outcome-related evidence that was used to determine the power calculation, or if not, state the reason for why you considered a sample size calculation with an 80% of power and 5% of significance. Please describe the censustype sampling procedure or support the terminology with a reference that describes it to the reader.
Operational procedures section. First paragraph, first sentence: Please consider replacing XXX with number or word because it looks like information has been omitted to the reader, or please further explain this reference. Please specify what the ICT project was awarded or to whom the project was awarded. Please consider revising the second to last sentence of the second paragraph to explain how participants completed the surveys to provide responses selected or written by themselves and participants took about 15 minutes to complete it.
Measures section. Please clarify the term, 'self-administered survey' by providing more information or revising the sentence so it better matches the description of the survey in the operational procedures section.
Data analysis section. The analyses performed by the authors looks appropriate, however, the explanations are dense and may not be easily understood by readers of this journal. Please revise this section so that it is suitable for clinical educators who are not necessarily biomedical scientists or statisticians. Please clarify in the text below on what statistical basis the covariates were selected so the reader does not need to make inferences about the procedures. Please explain if the forward addition assessed using a LRT and if so, explain if a chi-squared distribution or something else was used. Please also describe the criterion used to include or exclude the covariate. Otherwise, the reader has to make the assumption that this information is encompassed in the 'all hypotheses…' statement. Please consider stating the following information more thoroughly and with descriptions meaningful to a clinical educator reader: "We estimated nested models following a manual forward selection method to identify covariates associated with self-reported manuscript publication until reaching a parsimonious multivariable model. These covariates were selected using likelihood ratio tests. Crude and adjusted prevalence ratios (PR) were estimated with 95% confidence intervals (CI 95%). All hypotheses were contrasted using 5% significance"

Results
First paragraph, first sentence: Please resolve the following contradiction: The methods state that participants completed surveys, however, the first sentence of the results states that interviews were conducted. Both cannot be true.
Please consider revising the first, second, third, and fourth paragraphs of the results section so the information presented in them is not repeated in the tables of results as well. Instead, please draw the reader's attention to important or relevant proportions, relationships, differences, and similarities among the results. In the third paragraph, the percentage stated for the prevalence of scientific publications among first-year and last-year medical students (13% vs 4.3%, respectively) is backwards from the

Pirated academic accounts and use of Sci-Hub section.
To appreciate the similarities and differences between Sci-hub and pirated accounts, please define and explain these two factors in the methods section for the reader. To help the reader understand the significance of the results presented in the first and second paragraphs, please explain how the use Sci-Hub might contrast with access to relevant literature provided by the participating medical schools. In the second paragraph, the fifth sentence begins with 'This'. Please specify the subject at the beginning of this sentence so the reader can appreciate what might be homogeneous and better understand the point made in the final sentence in the paragraph. In the last paragraph of this section, please make links between the finding about pirated academic accounts in the first sentence and the statements that follow it. Please help the reader to understand how the statements help the reader to understand the significance and implications of this finding.
The discussion in the sections about courses in writing, English proficiency, and affiliated to a scientific medical student society do a good job of relating the findings to the local context and the literature. Please consider how the discussion can be broadened to address how your findings might be used by others to generate insights into their own local context or make suggestions about how the findings might offer insights to educators and other educational researchers about the factors from medical training associated with producing a scientific publication during undergraduate study. Since this is a medical and health professions education journal, our readers are interested in the implications for students, teachers, and medical programme curriculum and resourcing.
Limitations section. Please give an example to explain to the reader what an undifferentiated classification of the outcome and an increase in residual confusion. Please remember the readers are clinical educators who are not necessarily biomedical scientists or statisticians. Please explain the possible influence of FELSOCEM on the results and how it may bias the results. What factors might be missing in these results that might give readers further insights into the problem of producing scientific publications during medical training. Please outline a few future directions that you or other researchers might take with the findings or the study design to extend our understanding of this problem and topic

Conclusion
Please revise the very long final sentence into at least three shorter ones to help the reader appreciate the important concluding points.

Is the work clearly and accurately presented and does it cite the current literature? Partly
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
Second paragraph, awkward word choice: I'm not certain what 'stands' means in this sentence. Please consider revising this long sentence into two shorter ones. The second sentence could begin: "SOCIMEP holds international,..."" Response: Thank you. The overall sentence was revised.
"The introduction provides a well-argued warrant for the study by established the local need for the study. As a reader from outside Latin America, I am wondering how this situation might be similar and different to other regions. Please consider relating this problem to the locations of readers beyond Latin America." Response: Thank you. A sentence about the problem related to other contexts was added in the last paragraph (second sentence).

"Methods
The aim is clearly stated at the beginning of this section and the methods presented are appropriate to address it."

Response: Thank you
"Population and sampling section. Please explain the educational outcome-related evidence that was used to determine the power calculation, or if not, state the reason for why you considered a sample size calculation with an 80% of power and 5% of significance. Please describe the censustype sampling procedure or support the terminology with a reference that describes it to the reader." Response: Thank you. The reason for the sample size calculation with the referred parameters were explained. Census-type sampling was revised to a simpler phrase.
"Operational procedures section. First paragraph, first sentence: Please consider replacing XXX with number or word because it looks like information has been omitted to the reader, or please further explain this reference. Please specify what the ICT project was awarded or to whom the project was awarded. Please consider revising the second to last sentence of the second paragraph to explain how participants completed the surveys to provide responses selected or written by themselves and participants took about 15 minutes to complete it." Response: Thank you. "XXX" was revised to 30 th . The ICTs project was awarded an amount of money for publication (details in first & second sentence). The second to last sentence of the second paragraph was revised (please refer to this part).
"Measures section. Please clarify the term, 'self-administered survey' by providing more information or revising the sentence so it better matches the description of the survey in the operational procedures section." Response: Thank you. The term "self-administered survey" was revised "Data analysis section. The analyses performed by the authors looks appropriate, however, the explanations are dense and may not be easily understood by readers of this journal. Please revise this section so that it is suitable for clinical educators who are not necessarily biomedical scientists or statisticians. Please clarify in the text below on what statistical basis the covariates were selected so the reader does not need to make inferences about the procedures. Please explain if the forward addition assessed using a LRT and if so, explain if a chi-squared distribution or something else was used. Please also describe the criterion used to include or exclude the covariate. Otherwise, the reader has to make the assumption that this information is encompassed in the 'all hypotheses…' statement. Please consider stating the following information more thoroughly and with descriptions meaningful to a clinical educator reader: "We estimated nested models following a manual forward selection method to identify covariates associated with self-reported manuscript publication until reaching a parsimonious multivariable model. These covariates were selected using likelihood ratio tests. Crude and adjusted prevalence ratios (PR) were estimated with 95% confidence intervals (CI 95%). All hypotheses were contrasted using 5% significance"" Response: Thank you. We clarified the statistical basis for the forward selection method and other procedure details so that the information is meaningful to clinical educator readers.

"Results
First paragraph, first sentence: Please resolve the following contradiction: The methods state that participants completed surveys, however, the first sentence of the results states that interviews were conducted. Both cannot be true." Response: Thank you. The term "were interviewed" was clarified to "completed the survey" to avoid the contradiction.
"Please consider revising the first, second, third, and fourth paragraphs of the results section so the information presented in them is not repeated in the tables of results as well. Instead, please draw the reader's attention to important or relevant proportions, relationships, differences, and similarities among the results. In the third paragraph, the percentage stated for the prevalence of scientific publications among first-year and lastyear medical students (13% vs 4.3%, respectively) is backwards from the Response: Thank you. Relevant results were highlighted in the text. In addition, the results in Table 2 were revised and categories were included with their corresponding percentages.

Discussion
"Pirated academic accounts and use of Sci-Hub section. To appreciate the similarities and differences between Sci-hub and pirated accounts, please define and explain these two factors in the methods section for the reader. To help the reader understand the significance of the results presented in the first and second paragraphs, please explain how the use Sci-Hub might contrast with access to relevant literature provided by the participating medical schools. In the second paragraph, the fifth sentence begins with 'This'. Please specify the subject at the beginning of this sentence so the reader can appreciate what might be homogeneous and better understand the point made in the final sentence in the paragraph. In the last paragraph of this section, please make links between the finding about pirated academic accounts in the first sentence and the statements that follow it. Please help the reader to understand how the statements help the reader to understand the significance and implications of this finding." Response: Thank you. Use of Sci-Hub and use of pirated accounts were defined and explained in the methods section. The contrast between Sci-Hub usage and access provided by medical schools is detailed in paragraph 1 of the referred section. The subject was specified in the fifth sentence of the second paragraph. Sentences in the last paragraph have been revised to better link them to the first sentence of this part.
"The discussion in the sections about courses in writing, English proficiency, and affiliated to a scientific medical student society do a good job of relating the findings to the local context and the literature. Please consider how the discussion can be broadened to address how your findings might be used by others to generate insights into their own local context or make suggestions about how the findings might offer insights to educators and other educational researchers about the factors from medical training associated with producing a scientific publication during undergraduate study. Since this is a medical and health professions education journal, our readers are interested in the implications for students, teachers, and medical programme curriculum and resourcing." Response: Thank you. Suggestions were added at the end of each paragraph in the referred sections.
"Limitations section. Please give an example to explain to the reader what an undifferentiated classification of the outcome and an increase in residual confusion. Please remember the readers are clinical educators who are not necessarily biomedical scientists or statisticians. Please explain the possible influence of FELSOCEM on the results and how it may bias the results. What factors might be missing in these results that might give readers further insights into the problem of producing scientific publications during medical training. Please outline a few future directions that you or other researchers might take with the findings or the study design to extend our understanding of this problem and topic" Response: Thank you. Undifferentiated classification of the outcome and residual confounding were detailed. The influence of FELSOCEM was also explained. A brief list of relevant factors were added. Future directions were outlined.

"Conclusion
Please revise the very long final sentence into at least three shorter ones to help the reader appreciate the important concluding points." Response: Thank you. The final sentence was revised to three shorter ones.