’Spanscriptomics’: towards bioinformatics training materials in Spanish.

Jurgi Giraud; Stefana Dreptate; Wendi Bacon; The Galaxy Community

doi:10.12688/f1000research.171282.1

Home Browse ’Spanscriptomics’: towards bioinformatics training materials in Spanish.

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

’Spanscriptomics’: towards bioinformatics training materials in Spanish.

[version 1; peer review: 2 approved with reservations]

Jurgi Giraud ^1,2^*, Stefana Dreptate¹^*, Wendi Bacon^1,3, The Galaxy Community

^* Equal contributors

PUBLISHED 20 Oct 2025

Author details Author details

¹ The Open University School of Life Health and Chemical Sciences, Milton Keynes, England, UK
² The Open University School of Languages and Applied Linguistics, Milton Keynes, England, UK
³ Health Data Research UK, London, England, UK

Jurgi Giraud
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Stefana Dreptate
Roles: Conceptualization, Data Curation, Methodology, Project Administration, Writing – Review & Editing

Wendi Bacon
Roles: Conceptualization, Funding Acquisition, Methodology, Project Administration, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Galaxy gateway.

This article is included in the CABANA: Computational Biology Resources for and from Latin America collection.

Abstract

Background

As the field of bioinformatics continues to expand, the need for effective education and training becomes increasingly pressing. Despite global efforts to make bioinformatics education more inclusive, challenges remain in ensuring accessibility and linguistic diversity. A major barrier is the predominance of English-only resources, which can limit participation and learning outcomes among non-English-speaking communities.

Methods

This study investigated how linguistic accessibility affects learning outcomes in bioinformatics training. A two-day virtual and asynchronous workshop was hosted on the Gallantries platform under the Galaxy Training Network. Participants, primarily native Spanish speakers, were randomly assigned to one of three language conditions: human-translated Spanish materials, English materials, or machine-translated Spanish materials. Data were collected through pre- and post-workshop surveys focusing on demographics, English proficiency, language preference, and learning outcomes. Quantitative data were analyzed descriptively, and associations between English proficiency and language preference were tested using Pearson’s Chi-Square. Qualitative feedback from participants was examined through thematic analysis to identify key learning and engagement patterns.

Results

Findings reveal that linguistic familiarity plays a critical role in learner confidence and comprehension. Participants expressed a clear preference for materials available in their native language, underscoring the need for culturally and linguistically relevant educational resources. The analysis also identified key limitations in current translation practices used in scientific education, particularly in conveying specialized terminology accurately and naturally across languages.

Conclusions

The study highlights the importance of developing multilingual and culturally adapted bioinformatics resources to foster equitable access to training. Incorporating linguistic diversity into bioinformatics education has the potential to improve learner engagement and support the growth of a more globally inclusive scientific community.

Keywords

bioinformatics; bioinformatics education; Latin America; multilingualism; scientific translation

Corresponding author: Jurgi Giraud

Competing interests: No competing interests were disclosed.

Grant information: Company of Biologists grant EA455.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Giraud J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Giraud J, Dreptate S, Bacon W and The Galaxy Community. ’Spanscriptomics’: towards bioinformatics training materials in Spanish. [version 1; peer review: 2 approved with reservations]. F1000Research 2025, 14:1143 (https://doi.org/10.12688/f1000research.171282.1) First published: 20 Oct 2025, 14:1143 (https://doi.org/10.12688/f1000research.171282.1) Latest published: 20 Oct 2025, 14:1143 (https://doi.org/10.12688/f1000research.171282.1)

Introduction

Bioinformatics - the intersection of biology and computational science - has revolutionized biological research since its emergence in the late 20th century. The field of bioinformatics has evolved drastically from simple sequence analysis tools to sophisticated algorithms capable of handling vast datasets generated by high-throughput sequencing technologies (Baxevanis et al., 2020). As the field of bioinformatics continues to expand, there is an increasing demand for the education and training of the next generation of bioinformaticians. This necessity has prompted scrutiny into the development and accessibility of training programs and materials (Mulder et al., 2018), particularly evident on a global scale (Attwood et al., 2019; Aron et al., 2021; Marangoni et al., 2023). Such work demonstrates the growing recognition of the importance of ensuring that bioinformatics training is comprehensive, accessible, and tailored to meet the diverse needs of learners worldwide.

However, there are several challenges inherent in delivering bioinformatics training and creating accessible materials. One significant challenge is the steep learning curve associated with bioinformatics, which is exacerbated by the lack of linguistic diversity in available resources (Işık et al., 2023; Ras et al., 2021). Language barriers in science, coupled with the dominance of English, poses obstacles to achieving open and inclusive science, further limiting accessibility for scientists globally (Amano et al., 2016, 2021, 2023; Woolston & Osório, 2019). This trend particularly affects geographical regions such as Low-and Middle-income Countries (LIMCs). In regions like Latin America, scientists encounter numerous hurdles in accessing scientific knowledge, with the lack of linguistic representation and multilingual resources being prominent barriers (Kalergis et al., 2016; Ramírez-Castañeda, 2020; Valenzuela-Toro and Viglino, 2021; Massarani and de Oliveira, 2022; Yáñez-Serrano et al., 2022; Basilio, 2023). Despite the increasing interest in developing bioinformatics initiatives in Latin America (De Las Rivas et al., 2019; Hernández-Rosales, 2021), bioinformatics in this region faces similar challenges. An example of such initiatives is the CABANA project (https://www.cabana.online/), a collaborative effort aimed at enhancing bioinformatics capacity in Latin America. This project, orchestrated by an international consortium of organizations, including partners from Latin America and the UK, has also identified the critical necessity for making multilingual bioinformatics resources accessible to scientists (Stroe, 2022). This pilot study aims to address these challenges by advancing the creation of multilingual bioinformatics resources by and for Spanish-speaking bioinformaticians. The Open University, with the collaboration of the Galaxy Training Network (GTN, https://training.galaxyproject.org/) (Hiltemann et al., 2023), designed a virtual and asynchronous workshop titled “Spanscriptomics: Análisis de células únicas usando Galaxy|Single cell analysis using Galaxy” and collected pre- and post-workshop data from the participants.

The pilot study was guided by the main objective to identify the need, reception, and impact of translated bioinformatics training materials among Spanish-speaking bioinformatics trainees.

Materials and methods

Ethics and consent

Approvals were obtained from the Open University Human Research Ethics Committees (HREC, reference number 4135). A full Data Protection Impact Assessment (DPIA) was carried out as part of the ethics application. Written informed consent was obtained from participants for the use of their anonymized data in publication.

Workshop

The study involved the creation of a virtual and asynchronous workshop which lasted two days (from 2021-11-29 to 2021-11-30) during which all data was collected. The workshop was hosted on the Gallantries platform (https://gallantries.github.io/about) as part of the Galaxy Training Network (GTN). The workshop was comprised of two slide decks with videos, and three tutorials and three walkthrough videos. The participants were randomized into three groups:

• HES (Human-translated Spanish content and video captions, and Spanish dubbing for audio),
• ENG (English content, and English audio and captions),
• CAT (Machine-translated Spanish content and video captions, audio in English).

Participants were allowed to change groups at any stage of the workshop. Participants had access to trainers across multiple time zones throughout the workshops via an online chat workspace. The human translations of the workshop materials were generated by bilingual workshop presenters, instructors, and organizers with bioinformatics expertise. The machine-generated translations were produced by Google Translate (https://translate.google.com/), a free on-line Machine Translation (MT) tool.

Recruitment and participants

An interest registration link was available on the main workshop web page. A leaflet invitation containing the dates and the details of the workshop along with an interest registration link was also sent to the study authors to share across their networks and institutions. The workshop was publicized on existing Galaxy platforms. Additionally, the workshop was publicized on social media through institutional accounts. Consent to participate in the study was obtained during registration, where participants were also given a project information sheet and the pre-course survey. The workshop’s target audience was clearly identified as being native speakers of Spanish.

Surveys

Two sets of survey questions were created using Jisc (https://www.onlinesurveys.ac.uk/) with the aim of collecting information and feedback from participants. The first survey was shared with participants during registration (pre-workshop survey), and the second survey was shared with participants after the workshop (post-workshop survey). The pre-workshop survey contained 15 questions on demographics, education, English-proficiency levels, and preferred language for workshop materials. The post-workshop survey contained 17 questions pertaining to participants’ learning outcomes as well as feedback from participants on their learning experience.

Data processing

Out of the 155 participants who filled in the pre-workshop survey, four opted out, leaving 151 participants’ data for analysis. 25 participants filled in the post-workshop survey, out of which five were incomplete and removed, leaving 20 participants’ data for post-workshop analysis. For participant self-assessment of learning, a rating system based on a revised Bloom’s taxonomy was developed. Revised versions of this tool are widely utilised in life sciences education (Larsen et al., 2022). Learning was also assessed using three traditional, exam-style questions developed by the workshop trainers.

Data analysis

The majority of the data outlined in this article is descriptive. The correlation between participants’ English language proficiency and their preference for training materials was evaluated using a Pearson’s Chi-Square test (McHugh, 2013; Sharpe, 2015). A thematic analysis was conducted to uncover key themes in participants’ feedback (Fugard, 2020).

Results

Pre-workshop survey: Participants’ demographics

Figure 1 illustrates the distribution of participants based on their countries of origin (Figure 1A) and work (Figure 1B). Peru was the primary destination for work, attracting nearly 25% of participants, while also serving as the country of origin for 26% of the participants. Furthermore, a significant majority, approximately 75% of the participants, were engaged in work activities within the geographic regions of South, North, or Central America. Similarly, approximately 80% of the participants originate from these same regions. Almost 20% of the participants declared hailing from and working in Spain. 77% (n = 116) of participants had never worked, studied, or lived in an English-speaking country (Figure 2A). Sex representation was even across participants, with 73 self-identifying as male and 72 self-identifying as female (Figure 2A). Six participants preferred not to self-identify as either male or female.

Figure 1. Map of participant distribution across countries.

(A) Distribution of participants across their country of origin, and (B) country of work.

Figure 2. Participant demographics and English proficiency.

(A) Participants with experience living or working in an English-speaking country and sex representation across participants (self-identified), (B) Participants’ highest level of education and English proficiency, (C) English proficiency across education groups.

Pre-workshop survey: Education & English proficiency

Figure 2B presents the distribution of participants’ highest level of education. Approximately 76% of the participants reported possessing a university qualification, while around 20% stated having a baccalaureate or A level qualification only. A subset of 5 participants possessed neither a university qualification nor a baccalaureate/A level qualification. Within the university qualifications, Undergraduates (28%), Masters (23%) and PhD degree holders (25%) were evenly represented.

Participants were requested to self-assess their proficiency in the English language. Approximately 45% of participants indicated possessing an intermediate level of English proficiency (Figure 2B). Around 30% of participants characterized their English proficiency as advanced, while thirteen participants regarded themselves as fluent in the language. Among the proficiency categories, beginners constituted the second smallest group, with roughly 15% of participants. When taking education level into account, the percentage of participants categorized as beginners in English diminishes with educational longevity. This reduction starts from 35% among participants with a Baccalaureate/A level qualification, reaching 0% for Master’s holders and 3% for PhD holders. English proficiency correlates with higher educational attainment, as illustrated in Figure 2C. Conversely, as the beginner category declines, the intermediate category experiences a corresponding increase. Among Baccalaureate/A level holders, 29% fall within the intermediate proficiency range, while the proportions rise to 43% for Master’s holders and further to 61% for PhD holders. Interestingly, Master’s degree holders demonstrate a notably more advanced English proficiency compared to their peers across other educational levels. Approximately 46% of participants in this group reported having an advanced proficiency level, in contrast to 26% for those with Baccalaureate/A level and PhD qualifications, and 28% for those with Bachelor’s degrees/Undergraduate degrees. Fluent category (13 total participants) remains consistent across all education levels, with an average of 9.5%.

Pre-workshop survey: Preferred language

When asked their preferred language for workshop materials - HES, ENG, or CAT - almost 74% of participants expressed a preference for HES. A notable 25% indicated a preference for ENG, while only one participant selected CAT (Figure 3A).

Figure 3. Language preferences for workshop materials.

(A) Preferred language for workshop materials, (B) Preferred language for workshop materials per English proficiency.

We further examined participants’ language preferences in relation to their English proficiency. The higher the English proficiency, the less inclined participants are to opt for HES as their preferred language (Figure 3B). This trend relating English proficiency and preferred language group was statistically significant ( Table 1) (X2 = 41.53, df = 6, p = <.001). The “CAT” preference was under-represented across beginner, intermediate, and advanced fluency levels.

Table 1. Residuals from Chi-Square test.

Adjusted residuals denoted in bold represent values that go beyond the range of +/- 2. Adjusted residuals suggest deviations from what would be expected if language fluency and language preference were independent.

		HES	ENG	CAT	Total
Beginner	Obs	21	1	0	22
	Std. Res	2.5	-2.4	-0.41
	Adj. Res	1.2	-1.9	-0.38
Intermediate	Obs	60	9	0	69
	Std. Res	3.3	-3.1	-0.92
	Adj. Res	1.2	-2.0	-0.68
Advanced	Obs	28	19	0	47
	Std. Res	-2.8	2.9	-0.67
	Adj. Res	-1.2	2.1	-0.56
Fluent	Obs	3	9	1	13
	Std. Res	-4.4	3.8	3.27
	Adj. Res	-2.1	3.2	3.11
Total		112	38	1	151

Post-workshop survey: learning outcomes and language groups

Out of the 151 participants who completed the pre-workshop survey, only 20 completed (13%) the post-workshop survey, therefore the data presented here is descriptive only. The post-workshop survey was designed to collect data with regard to participants’ learning outcome as well as participants’ feedback regarding language groups.

Among those 20 participants, six were randomized in group ENG, seven in group HES, and seven in group CAT (Figure 4A). Participants were given the opportunity to migrate from their group of origin to a different group at any point (Figure 4A). Seven participants did so: one participant migrated from ENG to HES, two participants migrated from HES to ENG, and finally four participants migrated from CAT, two each to ENG and HES. The final number of participants for each group was nine for ENG, eight for HES, and three for CAT, as seen in Figure 4A.

Figure 4. Group allocation, performance, and learning outcomes.

(A) Number of participants randomized in each group and participants’ migration towards a different group with final number of participants per group after migration, (B) Correct answers per question, (C) Depth of learning scores across language groups.

Qualitative feedback was collected from participants to understand the rationale behind their migration towards a different group. The main reason why more than half of the participants originally randomized into CAT decided to leave this group was because of the quality of the translations generated by the Machine Translation. One participant mentioned that:

La traducción automática genera muchas incoherencies en palabras y por tal razón su interpretación. Los video tambien tenian sus problemas de traducción y de imagenes.

The automated translation generates many inconsistencies in words, and for this reason in their interpretation. The videos also had translation and image problems.

Another participant noted:

Había muchos errores de traducción en la traducción automática que dificultaban el seguimiento del curso.

There were many translation errors in the automated translation that made it difficult to follow the course.

In terms of other language group transfers, one participant transferred from HES to ENG with the following rationale:

Prefiero utilizar inglés cuando trabajo con este tipo de análisis porque estoy más acostumbrada a estos términos.

I prefer to use English when working with this type of analysis because I am more accustomed to these terms.

Finally, one participant transferred from ENG to HES, and explained:

Me parece más cómodo aprender usando mi idioma nativo.

I find it more comfortable to learn using my native language.

Learning outcomes

The participants were then tasked with answering three questions in order to test their comprehension of the material.

Figure 4B gives an overview of the success rate across all 3 questions, which is around 33.3% on average. Question 1 recorded the lowest success rate with only 15% correct answers, whereas Question 3 recorded the highest success rate with 45% correct answers. Looking at the success rate per language group, we can observe that participants within the CAT group performed the worst, with 67% scoring 0 correct answer, and 33% scoring 1 correct answer only. ENG and HES groups display close performance, with HES being the only group scoring 3 correct answers.

In addition to scores based on knowledge questions, participants were asked to self-assess their depth of learning after completing the workshop. This was done using an 8-point scoring system based on the Bloom’s Taxonomy learning levels, as follow:

• 1: Repeat with help
• 2: Repeat without help
• 3: Describe what you are doing in the workflow
• 4: Describe why most tasks need to be done and their outcomes
• 5: Implement the analysis on public (or your own) data (not counting data already in the tutorials)
• 6: Compare and contrast analytical choices (i.e., for granularity of cluster calling)
• 7: Defend of critique interpretation of data based on its analysis
• 8: Design novel workflows for single cell analysis

Figure 4C provides an overview of the self-assessed scores for each participant based on their final language group. The average self-assessed score for the HES group is 4.25, with scores ranging from 1 to 8. For the ENG group, the average score is 3.8, with scores ranging from 2 to 5. Finally, the average score for the CAT group stands at 5, with scores ranging from 3 to 8.

Participants’ experience

Participants were given the opportunity to provide free-form feedback in both pre- and post-workshop surveys. Pre-workshop survey contained two such questions, post-workshop survey contained six. From this feedback, two main themes arose, along with seven sub-themes ( Table 2). In addition to the poor quality of machine translations, the general appreciation for translated materials was high, along with preferences for English or bilingual materials. Workshop design also appeared frequently, with a desire for more time, more detail, and an appreciation for the live trainer support.

Table 2. Thematic analysis.

Main themes and sub-themes identified from participants’ feedback. Description provided for each sub-theme. Participants’ quotations are available in the underlying data.

Main themes	Sub-themes	Description
1. Language for Training Materials	a. Quality of machine translations.	Quotations from participants discussing the poor performance of machine translation for workshop materials. Participants agree that machine translated materials contain many translation errors which lead to a difficult understanding of the materials.
	b. Appreciation for materials translated into Spanish.	Quotations from participants sharing appreciation for materials translated into Spanish, highlighting the importance to have access to such resources which are often very rarely available. Participants also express their preference for materials in Spanish when it comes to learning.
	c. Preference for materials in English.	Quotations from participants expressing a preference for materials in English.
	d. Preference for bilingual materials.	Quotations from participants expressed a preference for materials available in both English and Spanish.
2. Study Design	a. Not enough time.	Quotations from participants agreeing that the workshop should have lasted longer, as many did not have time to complete the workshop and the tasks.
	b. Not enough attention being paid to tools and versions.	Quotations from participants mentioning the lack of clarity when it comes to the tools being used in the materials, and their versions.
	c. Access to live trainers for guidance as a positive aspect of the study.	Quotations from participants showing appreciation for the live trainers and their availability throughout the workshop.

Feedback from translators

Along with the feedback collected from the participants, feedback was also collected from three of the translators involved in translating the materials from English to Spanish. One translator mentioned that the main challenge they faced during the translation process was:

Disciplinary vocabulary mainly. Also, the interface is in English and even in the term dictionary, there were certain things we agreed to keep untranslated.

It was also added that:

There was definitely a lack of resources directed to other language speakers.

Another translator mentioned that:

The main challenge was to translate the colloquial phrases we had in the video tutorials. In some cases, there was no real translation and we had to use completely different phrases with similar meanings. Also, some specialised terms used in the field were difficult to translate and in some cases, it turned out that no translation was required at all.

Lastly, one translator provided some insight into their translational approach and the resources they used:

I use The Carpentries’ Glossary (Multilingual and open-source glossary of terms used in computer science. Available at: https://github.com/carpentries/glosario) and I kept notes of how people were translating terms to favour the use of preferred terminologies.

Discussion

The findings of this study provide insights into the demographics, language preferences, learning outcomes and experiences of Spanish-speaking bioinformatics trainees and trainers. This study is pioneering in its approach, by simultaneously providing a resource for bioinformatics learning across languages and a platform for voicing the experiences and preferences of native Spanish-speaking scientists. Additionally, the study facilitated contributions from educators and developers of bioinformatics training resources, enabling them to share insights from their experience.

While a few participants indicated working outside the hispanosphere, the large majority reported originating and working in a Spanish-speaking country, predominantly in Latin America. Participants displayed a broad range of English proficiency and education levels, with a significant portion self-assessing their English proficiency as beginner or intermediate. The study revealed a clear preference among participants for bioinformatics resources in Spanish, evident from their choice of language group and the feedback they provided. Also apparent was the general aversion towards machine-translated materials, as indicated by the lack of interest shown by participants in this language group and the feedback they provided. With regard to learning outcomes, knowledge questions recorded a majority of incorrect answers, contrasting with participants’ self-assessed depth of learning but corroborating general workshop feedback wanting more time and more explanation.

Language can create a barrier in science and the use of English as lingua franca of science has both advantages and disadvantages (Woolston and Osório, 2019). While English serves as a facilitator of global collaboration in science, its dominance also raises concerns regarding linguistic diversity and inclusivity. The advantages of a common language for communication must be weighed against the disadvantages of potentially excluding non-English-speaking scientists and overlooking the rich scientific contributions emerging from diverse linguistic and cultural contexts. This barrier is exemplified within the demographic of native Spanish speakers in our study, the majority of whom reported possessing a beginner or intermediate level of English proficiency. This poses potential challenges for their ability to access and engage with English-language bioinformatics materials, which could hinder their capacity to understand training materials or replicate research findings in this field (Işık et al., 2023; Ras et al., 2021).

Additionally, the clear preference among participants for resources in Spanish underscores the importance of linguistic familiarity and cultural resonance in scientific communication. This preference reflects a desire to engage with materials in their native language, where nuances and context are more readily understood, potentially enhancing comprehension and knowledge retention. Participants’ choice of language for training materials was closely associated with their level of proficiency in English. The greater their proficiency in English, the more inclined participants were to opt for English as their preferred language. Nevertheless, Spanish was the favored language for training materials, except for those fluent in English. Fluent speakers constituted less than nine percent of all participants.

Machine-generated translation was the least preferred by our participants. Participants’ feedback showed that machine-generated translations lack precision and intelligibility for bioinformatics materials. Machine Translation (MT) has the potential to accelerate and facilitate knowledge exchange in science (Tinsley, 2019; Steigerwald et al., 2022), however such MT systems require training on domain-specific data in order to capture the complexity of the domain, such as specialised terminology. The limitations of current machine translation systems underscore the need for domain-specific adaptations to improve their efficacy in scientific contexts. This process of domain-adaptation (i.e., tailoring a MT system to a specific discipline) can be laborious and expensive, especially for low-resource domains, but has been proven to enhance the quality of MT-generated translations for the domain in which the system is adapted (Chu and Wang, 2018; Saunders, 2022). The system used to generate automated translations for this study was not specifically trained on bioinformatics data. As such, it is likely that this system failed to translate domain-specific elements, such as terminology. Furthermore, as emphasized in the feedback from translators, there is a scarcity of resources available to support translators in the translation of bioinformatics materials. This poses yet another challenge in the effort to develop multilingual resources in this field.

Limitations and recommendations for future studies

We acknowledge several limitations in our current study.

Firstly, there was a disparity between the number of participants who completed the pre-workshop survey and those who completed the post-workshop survey, preventing us from performing statistical analyses on the post-workshop data. In this study, participants were not incentivized to complete the post-workshop survey, which may have affected participation rates. We propose that offering incentives could enhance participation and motivate respondents to complete a post-workshop survey.

Secondly, based on participant feedback, there is room for improvement in the workshop design for future investigations, particularly by allowing participants more time to complete the tutorials. Moreover, the current study does not aim to gather data regarding participants’ socioeconomic backgrounds and access to language education. These factors could provide additional insights into the current landscape of bioinformatics education and training, as well as their accessibility, in Latin America.

Thirdly, as previously discussed, the machine-generated translations were conducted using a Machine Translation system not specifically tailored for the bioinformatics domain. It is likely that a system customized for this purpose would yield more accurate translations. It would be worth exploring the idea of a domain-adapted MT system for bioinformatics in a future study, and how this system performs, as the existence of such system could not be identified in the existing literature.

Conclusion

In conclusion, our study provides valuable insights into the demographics, language preferences, and experiences of Spanish-speaking bioinformatics trainees. We observed a clear preference among participants for Spanish content, emphasizing the importance of scientific translation and the availability of multilingual resources. However, machine-generated translations were found to be less preferred due to their lack of precision and intelligibility for bioinformatics materials. Addressing the language barrier in bioinformatics requires innovative solutions, such as the development of domain-specific machine translation systems tailored to the unique linguistic and terminological characteristics of the field. Overall, our study highlights the importance of addressing language barriers in bioinformatics to foster inclusivity and equity in scientific training, communication and research.

Data availability

All anonymized and processed data pertaining to pre- and post-workshop surveys, as well as thematic analysis, are available at: https://doi.org/10.6084/m9.figshare.25488865 (Giraud, 2024).

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgements

We would like to thank all participants for taking part in the study. We are thankful for the donation made by The Company of Biologists. We would also like to thank The Galaxy Community for their involvement in the project, namely María Bernardi, Melissa Black, Patricia Carvajal-López, Irelka Colina-Moreno, Grisel Alejandra Escobar-Zepeda, Lorena Gallego-Villar, Saskia Hiltemann, Pablo Moreno, Nicolás Palopoli, Jolene Ramsey, Helena Rasche, Beatriz Serrano-Solano, and Montserrat Ve Go.

References

Amano T, Berdejo-Espinola V, Christie AP, et al.: Tapping into non-English-language science for the conservation of global biodiversity. PLoS Biol. 2021; 19(10): e3001296. PubMed Abstract | Publisher Full Text | Free Full Text
Amano T, González-Varo JP, Sutherland WJ: Languages Are Still a Major Barrier to Global Science. PLoS Biol. 2016; 14(12): e2000933. Publisher Full Text
Amano T, Ramírez-Castañeda V, Berdejo-Espinola V, et al.: The manifold costs of being a non-native English speaker in science. PLoS Biol. 2023; 21(7): e3002184. PubMed Abstract | Publisher Full Text | Free Full Text
Aron S, Chauke PA, Ras V, et al.: The Development of a Sustainable Bioinformatics Training Environment Within the H3Africa Bioinformatics Network (H3ABioNet). Frontiers in Education. 2021; 6. Publisher Full Text
Attwood TK, Blackford S, Brazas MD, et al.: A global perspective on evolving bioinformatics and data science training needs. Brief. Bioinform. 2019; 20(2): 398–404. PubMed Abstract | Publisher Full Text | Free Full Text
Basilio H: Raising the Visibility of Latin American Science. Eos; 2023, March 20. Reference Source
Baxevanis AD, Bader GD, Wishart DS: Bioinformatics. John Wiley & Sons; 2020.
Chu C, Wang R: A Survey of Domain Adaptation for Neural Machine Translation. Bender EM, Derczynski L, Isabelle P, editors. Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics; 2018; pp. 1304–1319. Reference Source
De Las Rivas J, Bonavides-Martínez C, Campos-Laborie FJ: Bioinformatics in Latin America and SoIBio impact, a tale of spin-off and expansion around genomes and protein structures. Brief. Bioinform. 2019; 20(2): 390–397. PubMed Abstract | Publisher Full Text | Free Full Text
Fugard A: Thematic analysis. London: SAGE Publications Ltd; 2020.
Giraud J: ‘Spanscriptomics’: Towards Bioinformatics in Spanish.2024. Publisher Full Text
Hernández-Rosales M: Bioinformatics in Latin America: ISCB-LA SOIBIO RMB Symposium 2020. Interface Focus. 2021; 11(4): 20210038. Publisher Full Text
Hiltemann S, Rasche H, Gladman S, et al.: Galaxy Training: A powerful framework for teaching!. PLoS Comput. Biol. 2023; 19(1): e1010752. Publisher: Public Library of Science. PubMed Abstract | Publisher Full Text | Free Full Text
Işık EB, Brazas MD, Schwartz R, et al.: Grand challenges in bioinformatics education and training. Nat. Biotechnol. 2023; 41(8): 1171–1174. PubMed Abstract | Publisher Full Text
Kalergis AM, Lacerda M, Rabinovich GA, et al.: Challenges for Scientists in Latin America. Trends Mol. Med. 2016; 22(9): 743–745. Publisher Full Text
Larsen TM, Endo BH, Yee AT, et al.: Probing Internal Assumptions of the Revised Bloom’s Taxonomy. CBE Life Sci. Educ. 2022; 21(4): ar66. PubMed Abstract | Publisher Full Text | Free Full Text
Marangoni R, Bevilacqua V, Cannataro M, et al.: An overview of bioinformatics courses delivered at the academic level in Italy: Reflections and recommendations from BITS. PLoS Comput. Biol. 2023; 19(2): e1010846. PubMed Abstract | Publisher Full Text | Free Full Text
Massarani L, de Oliveira T : Research in science communication in Latin America: Mind the gap. J. Sci. Commun. 2022; 21(7): C08. Publisher Full Text
McHugh ML: The Chi-square test of independence. Biochem. Med. 2013; 23(2): 143–149. PubMed Abstract | Publisher Full Text | Free Full Text
Mulder N, Schwartz R, Brazas MD, et al.: The development and application of bioinformatics core competencies to improve bioinformatics training and education. PLoS Comput. Biol. 2018; 14(2): e1005772. PubMed Abstract | Publisher Full Text | Free Full Text
Ramírez-Castañeda V: Disadvantages in preparing and publishing scientific papers caused by the dominance of the English language in science: The case of Colombian researchers in biological sciences. PLOS ONE. 2020; 15(9): e0238372. PubMed Abstract | Publisher Full Text | Free Full Text
Ras V, Carvajal-López P, Gopalasingam P, et al.: Challenges and Considerations for Delivering Bioinformatics Training in LMICs: Perspectives From Pan-African and Latin American Bioinformatics Networks. Frontiers in Education. 2021; 6. Publisher Full Text
Saunders D: Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey. J. Artif. Intell. Res. 2022; 75: 351–424. Publisher Full Text
Sharpe D: Chi-Square Test is Statistically Significant: Now What? Pract. Assess. Res. Eval. 2015; 20(1): Article 1. Publisher Full Text
Steigerwald E, Ramírez-Castañeda V, Brandt DYC, et al.: Overcoming Language Barriers in Academia: Machine Translation Tools and a Vision for a Multilingual Future. Bioscience. 2022; 72(10): 988–998. PubMed Abstract | Publisher Full Text | Free Full Text
Stroe O: Building bioinformatics capacity in Latin America. EMBL; 2022, November 16. Reference Source
Tinsley J: The growing role of neural MT in the life sciences. MultiLingual. 2019. Reference Source
Valenzuela-Toro AM, Viglino M: How Latin American researchers suffer in science. Nature. 2021; 598(7880): 374–375. Publisher Full Text
Woolston C, Osório J: When English is not your mother tongue. Nature. 2019; 570(7760): 265–267. Publisher Full Text
Yáñez-Serrano AM, Aguilos M, Barbosa C, et al.: The Latin America Early Career Earth System Scientist Network (LAECESS): Addressing present and future challenges of the upcoming generations of scientists in the region. Npj Climate and Atmospheric Science. 2022; 5(1): Article 1. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Oct 2025

Author details Author details

¹ The Open University School of Life Health and Chemical Sciences, Milton Keynes, England, UK
² The Open University School of Languages and Applied Linguistics, Milton Keynes, England, UK
³ Health Data Research UK, London, England, UK

Jurgi Giraud
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Stefana Dreptate
Roles: Conceptualization, Data Curation, Methodology, Project Administration, Writing – Review & Editing

Wendi Bacon
Roles: Conceptualization, Funding Acquisition, Methodology, Project Administration, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

Company of Biologists grant EA455.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 20 Oct 2025, 14:1143

https://doi.org/10.12688/f1000research.171282.1

Copyright

© 2025 Giraud J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Giraud J, Dreptate S, Bacon W and The Galaxy Community. ’Spanscriptomics’: towards bioinformatics training materials in Spanish. [version 1; peer review: 2 approved with reservations]. F1000Research 2025, 14:1143 (https://doi.org/10.12688/f1000research.171282.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 20 Oct 2025

Views

7

Reviewer Report 25 Nov 2025

Wei Ruan, University of Georgia, Athens, Georgia, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.188877.r428968

Summary
This article addresses the critical barrier of language in bioinformatics training and provides a valuable descriptive overview of the demographics and preferences of Spanish-speaking trainees. While the motivation is commendable, I have outlined a few reservations regarding the ... Continue reading

Summary
This article addresses the critical barrier of language in bioinformatics training and provides a valuable descriptive overview of the demographics and preferences of Spanish-speaking trainees. While the motivation is commendable, I have outlined a few reservations regarding the statistical validity and experimental design that I believe should be addressed to strengthen the manuscript.

Major
1. Statistical Validity of Learning Outcomes: I am concerned about the validity of the quantitative learning outcomes due to the high attrition rate, particularly in the Machine Translation (CAT) group, which retained only 3 participants. Presenting mean scores and error bars for such a small sample size (as seen in Figure 4C) may be statistically misleading. I recommend reframing these specific results as descriptive or anecdotal evidence rather than quantitative comparisons to better reflect the data limitations.

2. Experimental Design and Randomization: The study protocol allowed participants to switch groups during the workshop, which resulted in a significant migration away from the CAT group. This flexibility, while understandable for an educational workshop, compromises the initial randomization and introduces selection bias. It would be beneficial to explicitly acknowledge in the methodology that this factor effectively shifts the study from a controlled experiment to an observational one, and to interpret the results of the remaining participants with this bias in mind.

3. Technological Context of Machine Translation: The study concludes that machine translation lacks precision based on the use of a generic online tool (Google Translate) as it existed in 2021. From a computational perspective, it is expected that non-domain-adapted models would struggle with specialized bioinformatics terminology. I suggest qualifying the conclusion to specify that generic, unadapted tools were found ineffective, rather than dismissing the potential of automated translation entirely, as modern domain-adapted systems or Large Language Models may offer different results.

4. Internal Consistency: There appears to be a discrepancy where the Discussion section candidly admits that the disparity in response rates "preventing us from performing statistical analyses on the post-workshop data," yet the Results section still presents comparative bar charts with error bars. I suggest aligning the Results section with the Limitations by removing or clearly labeling these quantitative figures to ensure they do not imply a statistical power that the authors have acknowledged is absent.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Large Language Models (LLMs), Natural Language Processing (NLP), Machine Learning, Bioinformatics, Computational Biology, Data Science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

8

Reviewer Report 11 Nov 2025

Carlos C. Goller, North Carolina State University, Raleigh, North Carolina, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.188877.r426098

Summary
This is an exciting and much-needed area of research. The study is based on a survey that collects quantitative data on participants' preferences for the language they prefer to learn bioinformatics materials. As stated in the conclusion, the ... Continue reading

Summary
This is an exciting and much-needed area of research. The study is based on a survey that collects quantitative data on participants' preferences for the language they prefer to learn bioinformatics materials. As stated in the conclusion, the study provides insights into the demographics, language, preferences, and experiences of Spanish-speaking bioinformatics trainees. There are limitations in the sample size and in the conclusions that can be drawn, given response rates and workshop participant composition. These limitations were described in the limitations and recommendations for future studies section. These results provide an intriguing initial view of the barriers and preferences that would make bioinformatics education accessible to broader audiences, particularly Spanish-speaking participants.

Major
Figure 3 provides strong preferences for HES. Can the authors comment on whether this trend has been identified in similar (or related) studies, if available?

The thematic analysis is intriguing, yet it is based on limited information, and the methods are not fully described. The table could be improved by including an indication of the frequency of the themes.

Figure 4 is rich and intriguing. The title mentions learning outcomes, but they are not clearly defined in the text. What is meant by "Depth of learning scores across language groups?" It may be challenging to infer the depth of learning based on self-assessment and the limited number of respondents.

How was learning through "exam-style questions" assessed while considering language proficiency? For example, how did the authors develop questions that aligned with their workshop goals and were accessible to the broad range of participants?

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Molecular biology and microbiology education; bioinformatics education

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Oct 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 20 Oct 25	read	read

Carlos C. Goller, North Carolina State University, Raleigh, USA
Wei Ruan, University of Georgia, Athens, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

25 Nov 2025 | for Version 1

Wei Ruan, University of Georgia, Athens, Georgia, USA

7 Views Cite this report Responses(0)

Approved With Reservations

Summary
This article addresses the critical barrier of language in bioinformatics training and provides a valuable descriptive overview of the demographics and preferences of Spanish-speaking trainees. While the motivation is commendable, I have outlined a few reservations regarding the statistical validity and experimental design that I believe should be addressed to strengthen the manuscript.

Major
1. Statistical Validity of Learning Outcomes: I am concerned about the validity of the quantitative learning outcomes due to the high attrition rate, particularly in the Machine Translation (CAT) group, which retained only 3 participants. Presenting mean scores and error bars for such a small sample size (as seen in Figure 4C) may be statistically misleading. I recommend reframing these specific results as descriptive or anecdotal evidence rather than quantitative comparisons to better reflect the data limitations.

2. Experimental Design and Randomization: The study protocol allowed participants to switch groups during the workshop, which resulted in a significant migration away from the CAT group. This flexibility, while understandable for an educational workshop, compromises the initial randomization and introduces selection bias. It would be beneficial to explicitly acknowledge in the methodology that this factor effectively shifts the study from a controlled experiment to an observational one, and to interpret the results of the remaining participants with this bias in mind.

3. Technological Context of Machine Translation: The study concludes that machine translation lacks precision based on the use of a generic online tool (Google Translate) as it existed in 2021. From a computational perspective, it is expected that non-domain-adapted models would struggle with specialized bioinformatics terminology. I suggest qualifying the conclusion to specify that generic, unadapted tools were found ineffective, rather than dismissing the potential of automated translation entirely, as modern domain-adapted systems or Large Language Models may offer different results.

4. Internal Consistency: There appears to be a discrepancy where the Discussion section candidly admits that the disparity in response rates "preventing us from performing statistical analyses on the post-workshop data," yet the Results section still presents comparative bar charts with error bars. I suggest aligning the Results section with the Limitations by removing or clearly labeling these quantitative figures to ensure they do not imply a statistical power that the authors have acknowledged is absent.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Large Language Models (LLMs), Natural Language Processing (NLP), Machine Learning, Bioinformatics, Computational Biology, Data Science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

8 Views

11 Nov 2025 | for Version 1

Carlos C. Goller, North Carolina State University, Raleigh, North Carolina, USA

8 Views Cite this report Responses(0)

Approved With Reservations

Summary
This is an exciting and much-needed area of research. The study is based on a survey that collects quantitative data on participants' preferences for the language they prefer to learn bioinformatics materials. As stated in the conclusion, the study provides insights into the demographics, language, preferences, and experiences of Spanish-speaking bioinformatics trainees. There are limitations in the sample size and in the conclusions that can be drawn, given response rates and workshop participant composition. These limitations were described in the limitations and recommendations for future studies section. These results provide an intriguing initial view of the barriers and preferences that would make bioinformatics education accessible to broader audiences, particularly Spanish-speaking participants.

Major
Figure 3 provides strong preferences for HES. Can the authors comment on whether this trend has been identified in similar (or related) studies, if available?

The thematic analysis is intriguing, yet it is based on limited information, and the methods are not fully described. The table could be improved by including an indication of the frequency of the themes.

Figure 4 is rich and intriguing. The title mentions learning outcomes, but they are not clearly defined in the text. What is meant by "Depth of learning scores across language groups?" It may be challenging to infer the depth of learning based on self-assessment and the limited number of respondents.

How was learning through "exam-style questions" assessed while considering language proficiency? For example, how did the authors develop questions that aligned with their workshop goals and were accessible to the broad range of participants?

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Molecular biology and microbiology education; bioinformatics education

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] Amano T, Berdejo-Espinola V, Christie AP, et al.: Tapping into non-English-language science for the conservation of global biodiversity. PLoS Biol. 2021; 19(10): e3001296. PubMed Abstract | Publisher Full Text | Free Full Text

[2] Amano T, González-Varo JP, Sutherland WJ: Languages Are Still a Major Barrier to Global Science. PLoS Biol. 2016; 14(12): e2000933. Publisher Full Text

[3] Amano T, Ramírez-Castañeda V, Berdejo-Espinola V, et al.: The manifold costs of being a non-native English speaker in science. PLoS Biol. 2023; 21(7): e3002184. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Aron S, Chauke PA, Ras V, et al.: The Development of a Sustainable Bioinformatics Training Environment Within the H3Africa Bioinformatics Network (H3ABioNet). Frontiers in Education. 2021; 6. Publisher Full Text

[5] Attwood TK, Blackford S, Brazas MD, et al.: A global perspective on evolving bioinformatics and data science training needs. Brief. Bioinform. 2019; 20(2): 398–404. PubMed Abstract | Publisher Full Text | Free Full Text

[6] Basilio H: Raising the Visibility of Latin American Science. Eos; 2023, March 20. Reference Source

[7] Baxevanis AD, Bader GD, Wishart DS: Bioinformatics. John Wiley & Sons; 2020.

[8] Chu C, Wang R: A Survey of Domain Adaptation for Neural Machine Translation. Bender EM, Derczynski L, Isabelle P, editors. Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics; 2018; pp. 1304–1319. Reference Source

[9] De Las Rivas J, Bonavides-Martínez C, Campos-Laborie FJ: Bioinformatics in Latin America and SoIBio impact, a tale of spin-off and expansion around genomes and protein structures. Brief. Bioinform. 2019; 20(2): 390–397. PubMed Abstract | Publisher Full Text | Free Full Text

[10] Fugard A: Thematic analysis. London: SAGE Publications Ltd; 2020.

[11] Giraud J: ‘Spanscriptomics’: Towards Bioinformatics in Spanish.2024. Publisher Full Text

[12] Hernández-Rosales M: Bioinformatics in Latin America: ISCB-LA SOIBIO RMB Symposium 2020. Interface Focus. 2021; 11(4): 20210038. Publisher Full Text

[13] Hiltemann S, Rasche H, Gladman S, et al.: Galaxy Training: A powerful framework for teaching!. PLoS Comput. Biol. 2023; 19(1): e1010752. Publisher: Public Library of Science. PubMed Abstract | Publisher Full Text | Free Full Text

[14] Işık EB, Brazas MD, Schwartz R, et al.: Grand challenges in bioinformatics education and training. Nat. Biotechnol. 2023; 41(8): 1171–1174. PubMed Abstract | Publisher Full Text

[15] Kalergis AM, Lacerda M, Rabinovich GA, et al.: Challenges for Scientists in Latin America. Trends Mol. Med. 2016; 22(9): 743–745. Publisher Full Text

[16] Larsen TM, Endo BH, Yee AT, et al.: Probing Internal Assumptions of the Revised Bloom’s Taxonomy. CBE Life Sci. Educ. 2022; 21(4): ar66. PubMed Abstract | Publisher Full Text | Free Full Text

[17] Marangoni R, Bevilacqua V, Cannataro M, et al.: An overview of bioinformatics courses delivered at the academic level in Italy: Reflections and recommendations from BITS. PLoS Comput. Biol. 2023; 19(2): e1010846. PubMed Abstract | Publisher Full Text | Free Full Text

[18] Massarani L, de Oliveira T : Research in science communication in Latin America: Mind the gap. J. Sci. Commun. 2022; 21(7): C08. Publisher Full Text

[19] McHugh ML: The Chi-square test of independence. Biochem. Med. 2013; 23(2): 143–149. PubMed Abstract | Publisher Full Text | Free Full Text

[20] Mulder N, Schwartz R, Brazas MD, et al.: The development and application of bioinformatics core competencies to improve bioinformatics training and education. PLoS Comput. Biol. 2018; 14(2): e1005772. PubMed Abstract | Publisher Full Text | Free Full Text

[21] Ramírez-Castañeda V: Disadvantages in preparing and publishing scientific papers caused by the dominance of the English language in science: The case of Colombian researchers in biological sciences. PLOS ONE. 2020; 15(9): e0238372. PubMed Abstract | Publisher Full Text | Free Full Text

[22] Ras V, Carvajal-López P, Gopalasingam P, et al.: Challenges and Considerations for Delivering Bioinformatics Training in LMICs: Perspectives From Pan-African and Latin American Bioinformatics Networks. Frontiers in Education. 2021; 6. Publisher Full Text

[23] Saunders D: Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey. J. Artif. Intell. Res. 2022; 75: 351–424. Publisher Full Text

[24] Sharpe D: Chi-Square Test is Statistically Significant: Now What? Pract. Assess. Res. Eval. 2015; 20(1): Article 1. Publisher Full Text

[25] Steigerwald E, Ramírez-Castañeda V, Brandt DYC, et al.: Overcoming Language Barriers in Academia: Machine Translation Tools and a Vision for a Multilingual Future. Bioscience. 2022; 72(10): 988–998. PubMed Abstract | Publisher Full Text | Free Full Text

[26] Stroe O: Building bioinformatics capacity in Latin America. EMBL; 2022, November 16. Reference Source

[27] Tinsley J: The growing role of neural MT in the life sciences. MultiLingual. 2019. Reference Source

[28] Valenzuela-Toro AM, Viglino M: How Latin American researchers suffer in science. Nature. 2021; 598(7880): 374–375. Publisher Full Text

[29] Woolston C, Osório J: When English is not your mother tongue. Nature. 2019; 570(7760): 265–267. Publisher Full Text

[30] Yáñez-Serrano AM, Aguilos M, Barbosa C, et al.: The Latin America Early Career Earth System Scientist Network (LAECESS): Addressing present and future challenges of the upcoming generations of scientists in the region. Npj Climate and Atmospheric Science. 2022; 5(1): Article 1. PubMed Abstract | Publisher Full Text | Free Full Text

’Spanscriptomics’: towards bioinformatics training materials in Spanish.

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Materials and methods

Ethics and consent

Workshop

Recruitment and participants

Surveys

Data processing

Data analysis

Results

Pre-workshop survey: Participants’ demographics

Figure 1. Map of participant distribution across countries.

Figure 2. Participant demographics and English proficiency.

Pre-workshop survey: Education & English proficiency

Pre-workshop survey: Preferred language

Figure 3. Language preferences for workshop materials.

Table 1. Residuals from Chi-Square test.

Post-workshop survey: learning outcomes and language groups

Figure 4. Group allocation, performance, and learning outcomes.

Learning outcomes

Participants’ experience

Table 2. Thematic analysis.

Feedback from translators

Discussion

Limitations and recommendations for future studies

Conclusion

Data availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated