LitSieve: An integrated literature search and triage tool for biocuration

Matt Jeffryes; Melissa Harrison; Henning Hermjakob; Johanna McEntyre

doi:10.12688/f1000research.163833.1

Home Browse LitSieve: An integrated literature search and triage tool for biocuration

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

LitSieve: An integrated literature search and triage tool for biocuration

[version 1; peer review: 1 approved with reservations]

Matt Jeffryes¹, Melissa Harrison ¹, Henning Hermjakob¹, Johanna McEntyre¹

PUBLISHED 11 Jul 2025

Author details Author details

¹ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK

Matt Jeffryes
Roles: Conceptualization, Investigation, Methodology, Software, Visualization, Writing – Original Draft Preparation

Melissa Harrison
Roles: Conceptualization, Supervision, Writing – Review & Editing

Henning Hermjakob
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Johanna McEntyre
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the EMBL-EBI collection.

This article is included in the AIDR: Artificial Intelligence for Data Discovery and Reuse collection.

Abstract

Biomedical databases are an important part of the scientific infrastructure for organising and synergising research outputs. Many of these databases abstract content from the rapidly expanding scientific literature. Therefore, database curators require effective literature search methods in order to capture research relevant to their domain.

This article describes LitSieve, a literature search tool with filtering based on text mined annotations, and flexible article organisation features. It allows users to define filters based on biomedical entities like genes, diseases and species to include or exclude particular articles within their results. By combining a search query with a filter, curators are able to identify articles relevant to the database which they are curating. LitSieve uses APIs provided by Europe PMC, from which abstracts, article full text and text mined annotations are drawn.

LitSieve is available at https://www.ebi.ac.uk/europepmc/litsieve/

Keywords

biocuration, information retreieval, text mining

Corresponding author: Melissa Harrison

Competing interests: No competing interests were disclosed.

Grant information: MJ has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 945405.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Jeffryes M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Jeffryes M, Harrison M, Hermjakob H and McEntyre J. LitSieve: An integrated literature search and triage tool for biocuration [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:685 (https://doi.org/10.12688/f1000research.163833.1) First published: 11 Jul 2025, 14:685 (https://doi.org/10.12688/f1000research.163833.1) Latest published: 11 Jul 2025, 14:685 (https://doi.org/10.12688/f1000research.163833.1)

Introduction

Biomedical databases have become a critical infrastructure supporting life science research. Biologists and bioinformaticians depend on databases to interpret their results.^1,2 Many important databases depend upon curation of the scientific literature in order to identify and extract relevant information into a structured format. When curating the literature, domain expert biocurators search and sort through scientific articles, and read those that appear relevant to their databases, focussing on the specific facts that they wish to capture.¹ For example, a reference to a particular pair of proteins interacting, or an association between a gene and a disease.

Biocurators may use biomedical literature databases to identify ‘curatable’ literature. In this work we describe LitSieve, a system building on the Europe PMC database to provide literature filtering and organisational functions designed to assist with biocuration workflows.³ While there is a variety of literature search and organisation software already available, LitSieve provides a unique ability to filter based on a wide range of text-mined annotations.

Europe PMC

Europe PMC is a comprehensive database of life science literature. It contains abstracts from PubMed and Agricola, full text articles from PubMed Central and content from 35 life science relevant preprint servers including bioRxiv and medRxiv. The database contains a total of over 45 million articles, and the full text of the article is searchable for 10 million of those. Literature in Europe PMC is enriched by over 2 billion text-mined annotations. Annotations are references to biomedical entities or concepts such as gene/protein names and diseases, extracted from the literature using a variety of methods. In total there are 43 different categories of annotation. These entities are normalised to an entry in a database. For example, species names are normalised to the NCBI taxonomy. These annotations are made available via a public REST API.³

Biocuration tools

A number of tools to assist biocurators have been developed. PubTator permits users to search based on six types of text mined ‘bioentities’–genes, diseases, chemicals, single nucleotide polymorphisms (SNPs), species and cell lines–against the PubMed and PubMed Central databases.⁴ Users are further able to search based on 12 types of text mined interactions between bio entities; for example drug interactions between two chemicals or causation between a SNP and a disease. It also allows users to gather articles into user defined ‘collections’.

LitSuggest uses machine learning (ML) to suggest similar articles to those selected by the user.⁵ Articles identified by the trained model can then be marked as relevant or irrelevant to further refine the model. In the context of curation, this permits biocurators to submit a list of articles they have already curated and ideally find further ‘curatable’ articles.

Tools using large language models (LLMs) to search and summarise literature have also emerged,⁶ however, these tools have yet to be comprehensively assessed in the context of biocuration. Given the impact of LLMs like ChatGPT on the wider technology landscape it seems inevitable that biocurators will use LLM based tools. However, their propensity for factual errors remains an open problem, and presents a challenge to deploying them on biocuration tasks, where statements must be reliably attributed.⁷

LitSieve development process

Development of LitSieve began with the goal of providing an interface to Europe PMC with improved utility for biocurators. The initial concept being that curators may prefer not to use certain ML-based suggestion or recommendation systems, due to their ‘black box’ nature.⁸ An internal survey of biocurators was conducted to understand their usage of literature search tools and the types of literature they were interested in. Possible features were discussed, and curators were observed while completing tasks. As development progressed, feedback from biocurators was incorporated into the prerelease versions at each stage.

The LitSieve system is based upon retrieval using a user-specified search query, the results of which are filtered as chosen by the user. This concept prioritises the explainability of the results, since it is clear to users exactly why a particular article has been included or excluded from their search results: Only search results which are retrieved by their boolean search term are included and, of those, only those that match all the filters are displayed in the search results. Therefore, the reason for the inclusion or exclusion of a particular article is always transparent (see Figure 1c for an illustration of a matching search result).

Figure 1. The LitSieve workflow.

A literature search is performed (a), the results optionally filtered (b), and then the literature retrieved (c) which can be read and annotated (d) according to the requirements of the user.

Although ML is used to identify many of the text mined annotations used to filter, this approach reduces the scope of the ‘black box’ area of the retrieval system, it is smaller and more comprehensible. This modular, filter-based concept also enables additional filters to be developed and added, fitting within the same architecture.

System overview

LitSieve is a literature search and organisation tool designed for biocuration. It permits users to perform a standard literature search and then filter it based upon text-mined annotations. The filtering system is very flexible and accommodates a wide variety of use cases. An overview of the process of using LitSieve is shown in Figure 1.

LitSieve builds upon Europe PMC’s public articles and annotations APIs and is implemented using the Vue JavaScript framework. Searches are configured using a form and user selected parameters define which relevant articles are fetched from Europe PMC. These results may then be filtered according to the text-mined annotations found in the articles. Any of the 43 categories of annotation in Europe PMC can be used for filtering. Users can filter to include articles according to the presence or absence of a specific annotation. Three types of filter are available (include, exclude, ignore), listed in Table 1 and illustrated in Figure 2. Annotations are fetched from Europe PMC, and then used to filter the articles client-side. The basic search, filtering and reading functions can be used by anonymous users. Saving lists, highlights and notes requires users to register with either an email address or by using ORCID login.

Table 1. The 3 filter types.

Filter type	Action
include	Only show search results that have an annotation mapped to a specified identifier
exclude	Only show search results that do not have an annotation mapped to a specified identifier
ignore	Only show search results that have an annotation of a specified type, but is not among the specified identifiers

Figure 2. An illustration of the filter types.

Three taxa are specified for filtering at the top. In the left column, 4 documents are shown. In reality, the filter would be applied using the entire document, or a specified section, but in this case a short fragment is used for illustrative purposes. In the fragments, all mentions of a species are underlined, and the specified species are highlighted. In the top row, each filter type is listed. Below the filter types it is indicated whether a filter of the corresponding type, with the three specified taxa, would result in the document being included in the search results or not.

The filter may be restricted to a specific section of the article (for example, finding only articles that have a ‘mouse’ annotation within their Methods section). Lists of identifiers may be saved for convenience, for example, if a curator has a list of diseases of interest that they wish to use as a filter on many searches.

For convenience, several types of annotation can be filtered using an integrated auto-complete interface. Species and other taxonomic ranks can be retrieved from the NCBI taxonomy [1], gene and protein names can be retrieved from UniProt, and terms from the Gene Ontology, Uberon, Experimental Factors Ontology, and Chebi can be retrieved from the Ontology Lookup Service.^9,10

Articles found using LitSieve can be saved to lists. This accommodates a triage workflow where users can flag literature as either curatable or non-curatable, but users may organise lists as they wish, and no specific workflow is imposed. Articles may be added or removed from lists directly from the search result page, or from the reading view. This permits, for example, a curator to remove an article from their ‘triage’ list after having read it and found it to be non-curateable. The “quick lists” feature allows users to assign an icon and colour to a particular list, which permits easy visual identification of list membership in the search results page. This allows a curator to identify, for example, articles they have already triaged.

In the reader view, users may highlight and add private notes to articles (see Figure 1d). Biocurators may use this to highlight curatable passages from the article or other pertinent details such as cell lines used in experiments.

Users may recall and reorder saved articles from a list management view. A list of all articles to which notes or highlights have been added is also available. Lists may be used to organise or prioritise articles for curation, or to save a group of related articles.

Use cases

IntAct

IntAct is a molecular interaction database.¹¹ It is essentially a graph of interacting molecules, with the vertices being biologically active molecules like proteins, and the edges denoting some kind of interaction between a pair of them. IntAct is manually curated; every interaction has been captured by a biocurator. This is a time intensive process, and given the available resources, prioritisation is necessary because not every possible interaction published can be incorporated into the database. As a strategic goal, IntAct has prioritised adding new molecules to the database (increasing the number of vertices) over adding edges between molecules already in the database, prioritising coverage over increasing the number of evidences for known relationships. Therefore, it is desirable to find literature that discusses protein–protein interactions where at least one of the proteins is not yet listed in IntAct.

LitSieve enables the IntAct biocurators to filter out articles that will not add new molecules to the database. After performing a literature search, an ‘ignore’ filter that lists UniProt identifiers for all proteins already present in IntAct can be applied. This will filter out any article that does NOT mention at least one protein not in the list specified by the user. That is, only articles mentioning proteins new to IntAct will be shown in the result list. While this does not guarantee that the article will discuss a curatable protein interaction, it will filter out articles which certainly do not increase the number of proteins covered by IntAct. In this way, LitSieve enables IntAct curators to perform literature searches constructed using their experience while benefiting from the text-mined annotations in Europe PMC to speed up their triage of the results. A step by step illustration of this workflow is available at 10.5281/zenodo.15682791.

UniProt

UniProt is a data resource for protein sequence and functional information. One component of UniProt is the SwissProt subset of the UniProt knowledgebase (UniProtKB/SwissProt). This is a curated resource summarising experimental and computationally predicted functional information selected and reviewed by an expert biocurator. In order to carry out this work, UniProt biocurators search for, and read, literature related to the proteins which they are tasked with creating and updating records for.

LitSieve has been used to curate proteins related to antimicrobial resistance into UniProt. The ability to filter search results based on species is beneficial during triage to sift out articles not related to the entry being curated. Since a single species may be referred to by multiple different names (for example, mouse, mice, M. musculus, Mus musculus), filtering based on concept rather than exact text matches can save time and effort during the triage process.

Conclusion

LitSieve allows biocurators to combine their literature search expertise with filters based on text-mined annotations. This transparent and reproducible approach to literature discovery allows biocurators and other users to understand why a particular article has or has not been captured by their query. The flexible filter architecture permits use cases that we have not yet anticipated. Based on Europe PMC, LitSieve benefits from daily literature updates and can search across over 31 million abstracts and over 10 million full text articles. Filtering can be performed using 2 billion text-mined annotations in 43 categories. There are a variety of other tools available for biocuration literature search, however, to our knowledge, no others are able to search based on this number of types of annotation.

LitSieve provides an integrated interface for organising and prioritising literature. We anticipate that by integrating biocuration related features into a single application, biocuration workflows can be made more efficient.

Software availability

LitSieve is available publicly at https://www.ebi.ac.uk/europepmc/litsieve/.

Source code is available in two repositories under an MIT licence. The front-end is available at https://gitlab.ebi.ac.uk/mjj/biocuration-toolbox , and the back-end is available at https://gitlab.ebi.ac.uk/mjj/litsieve-backend . An archived copy of these repositories at time of submission has been deposited in Zenodo: https://dx.doi.org/10.5281/zenodo.15480211.

Author contributions

All the authors contributed to conceptualisation and determining the methodology. HH, MH and JM provided supervision. MJ was responsible for software development, and for drafting the original manuscript. All authors contributed to review and editing.

Acknowledgements

We thank Islam Hassan, Mohamed Selim, and Jagadeeswararao Poluru for software engineering and Kalpana Panneerselvam, Paul Denny and other users for testing, and feedback. This work was supported by the European Molecular Biology Laboratory (EMBL).

References

1. International Society for Biocuration: Biocuration: Distilling data into knowledge. PLoS Biol. 2018 Apr 16; 16(4): e2002846. PubMed Abstract | Publisher Full Text | Free Full Text
2. Hirschman J, Berardini TZ, Drabkin HJ, et al.: A MOD (ern) perspective on literature curation. Mol. Gen. Genomics. 2010 May; 283(5): 415–425. PubMed Abstract | Publisher Full Text | Free Full Text
3. Rosonovski S, Levchenko M, Bhatnagar R, et al.: Europe PMC in 2023. Nucleic Acids Res. 2024 Jan 5; 52(D1): D1668–D1676. PubMed Abstract | Publisher Full Text | Free Full Text
4. Wei C-H, Allot A, Lai P-T, et al.: PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge. arXiv. 2024 Jan 19; 52: W540–W546. PubMed Abstract | Publisher Full Text
5. Allot A, Lee K, Chen Q, et al.: LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res. 2021 Jul 2; 49(W1): W352–W358. PubMed Abstract | Publisher Full Text | Free Full Text
6. Jin Q, Leaman R, Lu Z: PubMed and beyond: biomedical literature search in the age of artificial intelligence. EBioMedicine. 2024 Feb 1; 100: 104988. PubMed Abstract | Publisher Full Text | Free Full Text
7. de Wynter A , Wang X, Sokolov A, et al.: An evaluation on large language model outputs: Discourse and memorization. Nat. Lang. Proc. J. 2023 Sep; 4: 100024. Publisher Full Text
8. Holzinger A, Langs G, Denk H, et al.: Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019 Apr 2; 9(4): e1312. PubMed Abstract | Publisher Full Text | Free Full Text
9. UniProt Consortium: Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023 Jan 6; 51(D1): D523–D531. PubMed Abstract | Publisher Full Text | Free Full Text
10. Côté R, Reisinger F, Martens L, et al.: The Ontology Lookup Service: bigger and better. Nucleic Acids Res. 2010 Jul; 38 (Web Server issue): W155–W160. PubMed Abstract | Publisher Full Text | Free Full Text
11. Del Toro N, Shrivastava A, Ragueneau E, et al.: The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res. 2022 Jan 7; 50(D1): D648–D653. PubMed Abstract | Publisher Full Text

Footnotes

1 https://www.ncbi.nlm.nih.gov/taxonomy

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 11 Jul 2025

Author details Author details

¹ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK

Matt Jeffryes
Roles: Conceptualization, Investigation, Methodology, Software, Visualization, Writing – Original Draft Preparation

Melissa Harrison
Roles: Conceptualization, Supervision, Writing – Review & Editing

Henning Hermjakob
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Johanna McEntyre
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

MJ has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 945405.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 11 Jul 2025, 14:685

https://doi.org/10.12688/f1000research.163833.1

Copyright

© 2025 Jeffryes M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Jeffryes M, Harrison M, Hermjakob H and McEntyre J. LitSieve: An integrated literature search and triage tool for biocuration [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:685 (https://doi.org/10.12688/f1000research.163833.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 11 Jul 2025

Views

4

Reviewer Report 19 Sep 2025

Kim Rutherford, University of Cambridge, Cambridge, UK

Approved with Reservations

https://doi.org/10.5256/f1000research.180243.r410949

This paper describes "LitSieve", a literature search tool that improves on previous systems by providing integrated access to publication details and text-mined annotations, along with a filtering system allows users to narrow their search to relevant articles.

... Continue reading

This paper describes "LitSieve", a literature search tool that improves on previous systems by providing integrated access to publication details and text-mined annotations, along with a filtering system allows users to narrow their search to relevant articles.

--------------------------

I appreciate the summary of the filters in Figure 2.

The "exclude" and "include" filter types seem straightforward but I struggle to understand the "ignore" filter type. An example of "ignore" is given in the "Use cases" section but could the function of "ignore" be more precisely discribed earlier? Perhaps in the section that introduces the filters?

--------------------------

"Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?"

It's good to see that the source code is available and has been deposited in Zenodo.

The frontend repository README says "It may be run with or without the backend" but doesn't specify how. I can't see documentation for how to configure the frontend, either in the manuscript or in the repository. Please add this documentation to the repository or manuscript.

--------------------------

Please add a statement about future support, maintenance and software availability. I notice that the repositories linked to from the manuscript have had no code changes for 4 months? Has development and bug fixing stopped? Will there be future support?

--------------------------

"We thank Islam Hassan, Mohamed Selim, and Jagadeeswararao Poluru for software engineering"

If these software engineers made substantial contributions to the software, they should be co-authors. If not, consider a separate explanation for the contribution of each engineer, if there are differences. Thanking someone for "software engineering" in a software publication would be like thanking someone for "lab work" in a experimental publication.

--------------------------

The user-driven approach described here is encouraging:

"An internal survey of biocurators was conducted to understand their usage of literature search tools and the types of literature they were interested in. Possible features were discussed, and curators were observed while completing tasks."

"We thank ... Kalpana Panneerselvam, Paul Denny and other users for testing, and feedback"

"As development progressed, feedback from biocurators was incorporated into the prerelease versions at each stage."

Any users who have made substantial contributions in the form of feedback or ideas should be considered for co-authorship. Especially consider any biocurators who contributed multiple major suggestions that have been incorporated into the system. Are there users who have contributed more ideas or feedback than any of the current co-authors? If there are, they should be on the author list.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Software engineering. Bioinformatics.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 11 Jul 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 11 Jul 25	read

Kim Rutherford, University of Cambridge, Cambridge, UK

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

19 Sep 2025 | for Version 1

Kim Rutherford, University of Cambridge, Cambridge, UK

4 Views Cite this report Responses(0)

Approved With Reservations

This paper describes "LitSieve", a literature search tool that improves on previous systems by providing integrated access to publication details and text-mined annotations, along with a filtering system allows users to narrow their search to relevant articles.

--------------------------

I appreciate the summary of the filters in Figure 2.

The "exclude" and "include" filter types seem straightforward but I struggle to understand the "ignore" filter type. An example of "ignore" is given in the "Use cases" section but could the function of "ignore" be more precisely discribed earlier? Perhaps in the section that introduces the filters?

--------------------------

"Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?"

It's good to see that the source code is available and has been deposited in Zenodo.

The frontend repository README says "It may be run with or without the backend" but doesn't specify how. I can't see documentation for how to configure the frontend, either in the manuscript or in the repository. Please add this documentation to the repository or manuscript.

--------------------------

Please add a statement about future support, maintenance and software availability. I notice that the repositories linked to from the manuscript have had no code changes for 4 months? Has development and bug fixing stopped? Will there be future support?

--------------------------

"We thank Islam Hassan, Mohamed Selim, and Jagadeeswararao Poluru for software engineering"

If these software engineers made substantial contributions to the software, they should be co-authors. If not, consider a separate explanation for the contribution of each engineer, if there are differences. Thanking someone for "software engineering" in a software publication would be like thanking someone for "lab work" in a experimental publication.

--------------------------

The user-driven approach described here is encouraging:

"An internal survey of biocurators was conducted to understand their usage of literature search tools and the types of literature they were interested in. Possible features were discussed, and curators were observed while completing tasks."

"We thank ... Kalpana Panneerselvam, Paul Denny and other users for testing, and feedback"

"As development progressed, feedback from biocurators was incorporated into the prerelease versions at each stage."

Any users who have made substantial contributions in the form of feedback or ideas should be considered for co-authorship. Especially consider any biocurators who contributed multiple major suggestions that have been incorporated into the system. Are there users who have contributed more ideas or feedback than any of the current co-authors? If there are, they should be on the author list.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Software engineering. Bioinformatics.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. International Society for Biocuration: Biocuration: Distilling data into knowledge. PLoS Biol. 2018 Apr 16; 16(4): e2002846. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Hirschman J, Berardini TZ, Drabkin HJ, et al.: A MOD (ern) perspective on literature curation. Mol. Gen. Genomics. 2010 May; 283(5): 415–425. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Rosonovski S, Levchenko M, Bhatnagar R, et al.: Europe PMC in 2023. Nucleic Acids Res. 2024 Jan 5; 52(D1): D1668–D1676. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Wei C-H, Allot A, Lai P-T, et al.: PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge. arXiv. 2024 Jan 19; 52: W540–W546. PubMed Abstract | Publisher Full Text

[5] 5. Allot A, Lee K, Chen Q, et al.: LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res. 2021 Jul 2; 49(W1): W352–W358. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Jin Q, Leaman R, Lu Z: PubMed and beyond: biomedical literature search in the age of artificial intelligence. EBioMedicine. 2024 Feb 1; 100: 104988. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. de Wynter A , Wang X, Sokolov A, et al.: An evaluation on large language model outputs: Discourse and memorization. Nat. Lang. Proc. J. 2023 Sep; 4: 100024. Publisher Full Text

[8] 8. Holzinger A, Langs G, Denk H, et al.: Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019 Apr 2; 9(4): e1312. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. UniProt Consortium: Uniprot: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023 Jan 6; 51(D1): D523–D531. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Côté R, Reisinger F, Martens L, et al.: The Ontology Lookup Service: bigger and better. Nucleic Acids Res. 2010 Jul; 38 (Web Server issue): W155–W160. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Del Toro N, Shrivastava A, Ragueneau E, et al.: The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res. 2022 Jan 7; 50(D1): D648–D653. PubMed Abstract | Publisher Full Text

LitSieve: An integrated literature search and triage tool for biocuration

Abstract

Keywords

Introduction

Europe PMC

Biocuration tools

LitSieve development process

Figure 1. The LitSieve workflow.

System overview

Table 1. The 3 filter types.

Figure 2. An illustration of the filter types.

Use cases

IntAct

UniProt

Conclusion

Software availability

Author contributions

Acknowledgements

References

Footnotes

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated