ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

DarkQ: continuous genomic monitoring using message queues

[version 1; peer review: 2 approved with reservations]
PUBLISHED 01 Oct 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Newly sequenced genomes are often not noticed by potential stakeholders because submission to public databases is delayed, and search options are limited. However, the discovery of genomes can be vital: in pathogen outbreaks, fast updates are essential to coordinate containment efforts and prevent further spread. Here we introduce DarkQ, a message queue that allows for instant sharing and discovery of genomes.
DarkQ is released under the BSD-2 license at github.com/phiweger/darkq.

Keywords

outbreak, molecular surveillance, peer-to-peer, pathogen

Introduction

Many bioinformatic tasks complement newly sequenced genomes with existing, publicly available ones. For example, when reconstructing a local pathogen outbreak, screening similar genomes can discover related ones from other sampling sites, such as hospitals nearby, and can significantly affect public health responses.1,2 However, no tool exists to monitor newly sequenced genomes and automatically identify those of interest to the user. A long delay until they are publicly available explains why many outbreak studies are retrospective and offer limited practical value to the associated outbreak response. Several components are needed: first, a “publisher” needs to be able to send genomic “messages” to a “consumer” using a simple and secure interface. Second, the genomic “message” may require limited space to avoid upload problems or extensive storage infrastructure. Third, a mechanism is needed to route messages only to interested parties, e.g., consumers that search for genomes of a particular species in a specific geography. Lastly, on receiving a relevant message, download of the associated genome should be possible. Several projects currently develop ways to share genomes effectively (wort, stark). However, to our knowledge, ours is the first end-to-end solution available to users.

Methods

Implementation

DarkQ is implemented using the Nextflow workflow manager to ensure a robust, reproducible, and portable application.3 The user interface of DarkQ is similar to the popular file system service “Dropbox”; the content of a “send” directory is tracked. When a genome is added to it, it is first compressed (“sketched”) using the MinHash algorithm4,5 (sourmash, v3.5). The reduction in file size by orders of magnitude allows for efficient transmission. Together with metadata and inferred taxonomy (using sourmash), the genome sketch constitutes a “message” (Figure 1A). The receiving message queue then uses the Advanced Message Queuing Protocol (AMQP)6 to route messages (implementation: RabbitMQ, v3.8.9) onto queues, i.e. sequential groups of messages. The original genome is uploaded (“pinned”) to a decentralized, peer-to-peer network (IPFS, v0.7).7 Its content-based address is part of the genome message.

46c4e57a-de1e-43c0-8362-fd43c75b9275_figure1.gif

Figure 1. (A) Architecture of DarkQ: compressed genome “messages” (colored arrows) are sent by a publisher (P) onto the message queue.

A router (circle) distributes the messages to queues via routing keys (annotated arrows). Consumers (C) can use these keys to receive only a subset of messages and then further filter them with target genomes using MinHash sketches. In parallel, the genomes from the publisher are uploaded on a decentralized peer-to-peer (P2P) network. Once messages pass through the consumer’s filters, they are automatically downloaded from the P2P network. This architecture allows the effective distribution of newly sequenced genomes and enables continuous monitoring, e.g, in outbreak scenarios. (B) Use case simulation: a hospital becomes aware of a local outbreak of an XDR Klebsiella pneumoniae (Kp) isolate of subtype (ST, right metadata column) 258 carrying a plasmid-encoded KPC-2 carbapenemase. Using DarkQ, we identified 431 genomes from several countries (leaf colors) from 26 studies (left metadata column) with an average nucleotide identity (ANI) > 99.98% and identical resistance and capsule patterns (not shown). A time-dated phylogeny revealed several non-local isolates, suggesting that the outbreak reached further than previously assumed. An interactive version of the data can be found at microreact.org/project/facEFbDrgwgp9aX97nvpHq. Scale in number of SNVs.

The consumer can subscribe to messages using an arbitrary number of filters, so-called “routing keys”. Each routing key is unique and has five properties: name of sender (e.g. “phiweger”), country code (e.g. “DE”), taxon status (“found” or “mystery”), taxon level (either one of superkingdom, phylum, class, order, family, genus, species, and strain) and taxon name at that level (e.g. “Klebsiella” for genus) – these are adapted from and must conform to the Genome Taxonomy Database (GTDB, release 89).8 For example, “phiweger.DE.*.genus.klebsiella” would select all isolate genomes of the stated genus from Germany sent by the author.

Because we can estimate genome similarity using MinHash sketches,4 the consumer can quickly filter the received genome messages using target genomes, for example those belonging to a local pathogen outbreak or current research project. If this filter is passed, then the genome is automatically downloaded from the peer-to-peer network using its content hash address, which at the same time locates and validates the downloaded file. If multiple users pin the genome, download speed can increase substantially. A downstream workflow can then be connected to refine these genomes’ analysis further, enabling a complete monitoring system.

Operation

The software can be run on any UNIX-based operating system. Operation of DarkQ requires less than one Gb RAM and a single core. Details of the workflow can be found in the README file.12

Use cases

To test DarkQ in a monitoring system, we collected and sent onto DarkQ 9,415 genomes of Klebsiella pneumoniae, a pathogen considered an urgent global threat due to extensive antimicrobial drug resistance (CDC, AR threats report, 2019).9 We simulated a consumer subscribing to all messages from the Klebsiella genus and filtering the received messages using an isolate from a local outbreak at a large tertiary hospital in 2010.10 1,461 messages met both routing key and minimum genome similarity criteria of 0.97 at a k-mer size of 51, typically used to estimate the genomic distance at the strain level.11 After downloading the original genomes from the peer-to-peer network, they were further filtered and refined, resulting in a time-dated phylogeny (Figure 1B). The consumer thus received genomes from a total of 26 studies. Two of these studies contained genomes that belonged to the same outbreak clone the consumer used to filter the genomes. Further work is needed to investigate this relationship more thoroughly; however, an initial assessment was already possible by utilizing the mechanics implemented in DarkQ. All methods used in this use case are available elsewhere.12

Conclusion

DarkQ allows a user to monitor genomic data with a simple user interface, efficient genome compression, filter-based message routing, and fast download of corresponding genomes using a decentralized peer-to-peer network. The proof-of-concept outlined here scales to thousands of genomes and could be particularly valuable in the context of pathogen outbreaks. However, our approach can also be used to disseminate research more broadly.

Data availability

Underlying data

NCBI BioProject: Context-aware genomic surveillance reveals hidden transmission of a carbapenemase-producing Klebsiella pneumoniae. Accession number PRJNA742413. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA742413.

Software availability

Source code available from: github.com/phiweger/darkq.

Archived source code at time of publication: https://doi.org/10.5281/zenodo.5503447.13

License: BSD-2 license.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 01 Oct 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Viehweger A, Brandt C and Hölzer M. DarkQ: continuous genomic monitoring using message queues [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10:998 (https://doi.org/10.12688/f1000research.54255.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 01 Oct 2021
Views
1
Cite
Reviewer Report 29 Nov 2021
Rayan Chikhi, Department of Computational Biology, C3BI USR 3756 IP CNRS, Institut Pasteur, Paris, France 
Approved with Reservations
VIEWS 1
This article presents DarkQ, a proof of concept software for an architecture to perform microbial surveillance. The concept is interesting and original, and appears to be well-engineered. Monitoring microbes is of high interest scientifically. However, as I mentioned, DarkQ is ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Chikhi R. Reviewer Report For: DarkQ: continuous genomic monitoring using message queues [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10:998 (https://doi.org/10.5256/f1000research.57722.r96160)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
14
Cite
Reviewer Report 08 Nov 2021
Tessa Pierce-Ward, Department of Population Health and Reproduction, University of California, Davis, CA, USA 
Approved with Reservations
VIEWS 14
The authors present “DarkQ,” a peer-to-peer network and workflow for pathogen genome monitoring. The COVID-19 pandemic has underlined the need and utility for quickly sharing pathogen genomes across both hospitals and researchers. The workflow and file sharing protocol described here ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Pierce-Ward T. Reviewer Report For: DarkQ: continuous genomic monitoring using message queues [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10:998 (https://doi.org/10.5256/f1000research.57722.r96159)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 01 Oct 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.