Improving communication for interdisciplinary teams working on storage of digital information in DNA

Emily E. Hesketh; Jossy Sayir; Nick Goldman

doi:10.12688/f1000research.13482.1

Home Browse Improving communication for interdisciplinary teams working on storage...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Note

Improving communication for interdisciplinary teams working on storage of digital information in DNA

[version 1; peer review: 2 approved]

Emily E. Hesketh¹, Jossy Sayir², Nick Goldman ²

PUBLISHED 10 Jan 2018

Author details Author details

¹ Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK
² European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK

Emily E. Hesketh
Roles: Conceptualization, Writing – Original Draft Preparation, Writing – Review & Editing

Jossy Sayir
Roles: Writing – Review & Editing

Nick Goldman
Roles: Funding Acquisition, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the EMBL-EBI collection.

Abstract

Close collaboration between specialists from diverse backgrounds and working in different scientific domains is an effective strategy to overcome challenges in areas that interface between biology,
chemistry, physics and engineering. Communication in such collaborations can itself be challenging. Even when projects are successfully concluded, resulting publications — necessarily multi-authored — have the potential to be disjointed. Few, both in the field and outside, may be able to fully understand the work as a whole. This needs to be addressed to facilitate efficient working, peer review, accessibility and impact to larger audiences. We are an interdisciplinary team working in a nascent scientific area, the repurposing of DNA as a storage medium for digital information. In this note, we highlight some of the difficulties that arise from such collaborations and outline our efforts to improve communication through a glossary and a controlled vocabulary and accessibility via short plain-language summaries. We hope to stimulate early discussion within this emerging field of how our community might improve the description and presentation of our work to facilitate clear communication within and between research groups and increase accessibility to those not familiar with our respective fields — be it molecular biology, computer science, information theory or others that might become relevant in future. To enable an open and inclusive discussion we have created a glossary and controlled vocabulary as a cloud-based shared document and we invite other scientists to critique our suggestions and contribute their own ideas.

Keywords

DNA-storage, digital information storage in DNA, synthetic biology, glossary, communication, controlled vocabulary, short plain-language summaries, interdisciplinary collaboration

Corresponding author: Nick Goldman

Competing interests: No competing interests were disclosed.

Grant information: EEH and JS are supported by the UK's Biotechnology and Biological Sciences Research Council (BBSRC grants BB/L023741/1 and BB/L021994/1). NG is supported by the European Molecular Biology Laboratory.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2018 Hesketh EE et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Hesketh EE, Sayir J and Goldman N. Improving communication for interdisciplinary teams working on storage of digital information in DNA [version 1; peer review: 2 approved]. F1000Research 2018, 7:39 (https://doi.org/10.12688/f1000research.13482.1) First published: 10 Jan 2018, 7:39 (https://doi.org/10.12688/f1000research.13482.1) Latest published: 10 Jan 2018, 7:39 (https://doi.org/10.12688/f1000research.13482.1)

Introduction

As we tackle increasingly complex issues throughout science, a breadth of knowledge is often necessary to devise novel solutions — something frequently achieved through interdisciplinary collaborations. The inherent diversity within interdisciplinary teams stimulates knowledge exchange, creativity or even a change in perspective; however, it can be very challenging. We work within an emerging field in synthetic biology, repurposing DNA as a storage medium for digital information. Advancing from early proof-of-principle studies in the high-throughput era^1,2 (see references therein for historical perspective) towards a more reliable, refined and functional large-scale DNA storage system^3,4 raises unique challenges that can only be resolved through a broad collaborative effort between biochemical and DNA sequencing specialists, computer and molecular scientists, information theorists and others. This body of research has gained considerable interest both within the research community and with the public, and this has further emphasised the need to address our communication and the presentation of our work.

Interdisciplinary teams make significant advances in life sciences

Intersection between these fields is clearly beneficial. Information theory has already underpinned many advances in life sciences, from adapting Levenshtein coding to create error-correcting molecular barcodes used in multiplexed DNA sequencing⁵ to Burrows-Wheeler transformation of reference genomes implemented in several short read aligners^6–8. A molecular biologist may see the process of storing information in DNA as a very physical process, progressing from DNA synthesis (writing) to amplification (copying) to sequencing (reading). To an information theorist, this is a noisy channel: a series of transformations through which information is transmitted and the outputs observed. Differences in the way experts in these different fields describe their data and results can hinder collaboration and restrict impact. As a result, publications have the potential to be an ineffective hybrid of accepted nomenclature and data presentation within the intersecting fields with few readers, both in the team and outside, able to fully understand the publication as a whole.

Unambiguous communication can be challenging and misunderstandings can pass unnoticed

Unsurprisingly, common nomenclature between the intersecting disciplines has disparate meanings. Use of the word ‘qubit’ can lead you to believe that some DNA needs quantifying⁹ or you may be discussing quantum information or quantum field theory¹⁰. This complicates communication; misunderstandings have the potential to pass unnoticed, only becoming apparent downstream. Examples of such misunderstandings are the use of the words errors, erasures, and substitutions when retrieving data through DNA sequencing. To an information theorist, an ‘error’ refers to a falsely read symbol, for example when an A in the DNA sequence is falsely read as a C, distinct from an insertion or deletion. An ‘erasure’ would be a read that was possibly so uncertain that it is neither called as an A, C, G or T, but distinct from a ‘deletion’ in that the read is not simply missed but we are made aware that there is a missing symbol at this position in the DNA string. An ‘insertion’ is a symbol read, when no symbol should exist. To a molecular biologist and DNA sequencing expert, all of these would be described as read ‘errors’. To them, errors in the information theoretic sense would be called substitutions.

A glossary and controlled vocabulary for DNA-storage

DNA-storage has become a popular research field, with a number of interdisciplinary teams forming and collaborating in an attempt to make viable information storage systems that capitalise on DNA’s numerous advantages¹¹. To alleviate confusion and improve daily communication within and between these groups we propose, and have begun to implement, two measures: a glossary and a controlled vocabulary.

Glossary

We have created a glossary defining basic terms in molecular biology, information theory and computer science etc. that are relevant to DNA-storage, for those unfamiliar with one or more of these disciplines. This proved to be a useful aid in early discussions within our team and helped to identify areas of nomenclature ambiguity which if not addressed may have complicated communication downstream. We have already experienced the advantages of sharing this within our team and with collaborators to facilitate exchange of ideas with them.

Our glossary is held on a cloud storage system, and can be found at https://goo.gl/x6B73Q or https://rebrand.ly/dna-storage-glossary. To allow an open and inclusive discussion of how we might improve communication within this emerging community, we encourage others to critique and contribute to the glossary. The document permits “Suggestions” (proposed edits) and “Comments” to be added, and we will review these regularly and update the document as a resource for our research community.

Controlled vocabulary

Leading on from this, we are developing an evolving controlled vocabulary allowing team members to communicate precisely. This has been particularly beneficial during technical discussions — for instance, to us data packet refers to part of a DNA sequence that decodes to digital information, and excludes parts that are designed to facilitate DNA sequencing or indexing.

Use of a controlled vocabulary is something that the community may wish to agree upon. For example, one question we pose is — what should we name these DNA sequences that encode digital information? Following the practice of genome scientists, we initially called collections of such DNA sequences libraries. However, working with such samples caused confusion with our colleagues in a molecular biology laboratory: in a Next Generation Sequencing context, the term library is commonly used to describe DNA fragments that have been prepared for DNA sequencing. We now propose to refer to DNA sequences that store digital information as inDNA (for ‘information-carrying DNA’). To refer to inDNA prepared for DNA sequencing, we can now unambiguously talk about a library of inDNA.

We would like to invite others to contribute to the development of a controlled vocabulary so that we might be able to communicate more precisely. We have included a few entries within our glossary document.

Improving review, accessibility and impact of interdisciplinary publications

We now pose another question — how might we improve data description and presentation to increase accessibility and facilitate peer review and reproducibility? Peer review is crucial within the scientific community, but this quality improvement process may not be fully realised in interdisciplinary publications. We have experienced difficulties with peer review of publications related to DNA-storage applications, as authors of work under review, as reviewers ourselves, in our assessment of others’ reviews, and in dealings with journal editors. Often the expertise is not available, or reviewers may only evaluate limited aspects of the paper. The body of work may not be effectively reviewed as a whole, leaving authors without vital feedback and potentially leading to publication of flawed work.

Presentation can be improved by including a short plain-language summary

The concept of standardising presentation of data and methods is not a novel idea in the life sciences, with ‘minimum information’ standards ensuring that publications contain the information necessary to interpret the experimental data. These are typically technique- or study-specific, e.g. MIAME (microarray experiments)¹², MIQE (quantitative polymerase chain reaction)¹³ and MIFlowCyt (flow cytometry)¹⁴. Such an approach may not be appropriate to publications relating to DNA-storage applications for some time, as these typically encompass a number of disciplines, each with its own established data description standards and many of which use rapidly changing technologies. It is not appropriate or practical to standardise such a diverse range of technologies and disciplines. Rather we should respect the accepted discipline norms, blending these together to permit DNA-storage standards to evolve.

Even publications that sit predominantly within a single discipline may be of interest to those unfamiliar with that discipline and benefit from the inclusion of a whole-paper plain-language summary. As standard with plain-language summaries this should simply report the basic rational, methodology and main findings. Box 1 is a whole-publication plain-language summary of 2 that we have written as an example.

Box 1. Plain-language summary of 2.

With the amount of digital information that needs to be stored growing exponentially there is a need to develop new ways of storing information. High information capacity, longevity and constant improvements in technologies that allow writing, copying and reading make DNA an attractive medium for storing digital information. Here we present a scalable reliable method for storing digital information in DNA.

The original bytes of several computer files in various formats were encoded into DNA as follows. A Huffman code was used to compress each byte, depending upon occurrence frequency, into a block of 5–6 trits, which are the characters 0, 1 or 2 (just as bits are 0 or 1). A reference table of these blocks and corresponding nucleotide sequences was created, with each block having four possible nucleotide combination representations. Nucleotide combinations were selected depending also upon the previous block, in a manner that prevented the occurrence of any repeating nucleotides (e.g. AA), as these are known to cause downstream copying and reading problems. Following encoding the digital information was represented as 153,335 DNA sequences of length 117 nucleotides, each containing an index and a simple error checkpoint in addition to encoding part of the original digital information. These DNA sequences were printed as a pool of DNA, containing ~1.2 × 10⁷ copies of each sequence, which was copied via PCR and prepared for reading via DNA sequencing before being decoded (encoding strategy reversed).

Data totalling 739 kilobytes was successfully encoded into DNA, printed, copied, read and decoded with 100% accuracy. A storage density of ~2.2PB g⁻¹ DNA was achieved.

It may also be useful to provide a plain-language summary of a specific technical aspect of a publication. For example, a molecular scientist may not understand the details of a complex mathematical algorithm (and nor should the description be altered specifically to allow them to), but an appreciation of how the output impacts aspects of the project relevant to them may be sufficient. We illustrate this using a paragraph from 4 (from p.5, Methods — Address Design and Encoding). This was read and discussed by the first two co-authors of the present paper, EEH and JS. Figure 1 highlights terms that either EEH, a molecular biologist (purple shading), or JS, an information theorist (yellow shading), found difficult to understand. Joining forces and explaining all terms to each other, they were able to understand the paragraph in depth.

Figure 1. Sample paragraphs from 4.

Terms that may not be clear to non-specialists in particular fields are highlighted in purple and yellow, corresponding to those causing problems for a molecular biologist and an information theorist, respectively. (Used under the Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0).

As the interdisciplinary field of DNA-storage evolves towards maturity, there will be an increasing requirement for researchers from different backgrounds to understand publications without having access to colleagues from unfamiliar subject areas. This can be achieved in part by including brief summaries, which may make use of our glossary document, in specialised sections of a publications such that they become accessible for researchers from all disciplines.

Conclusions

We promote the value of interdisciplinary, collaborative science to solve complex problems, including in our field of digital information storage in DNA which combines molecular biology, information theory and computer science. We note the problems that this approach can generate in communication within and between research teams, and propose to reduce these in the DNA-storage area by initiating a glossary and controlled vocabulary. These have been made available to the research community for reference and critique, and we invite contributions to extend their scope.

Competing interests

No competing interests were disclosed.

Grant information

EEH and JS are supported by the UK’s Biotechnology and Biological Sciences Research Council (BBSRC grants BB/L023741/1 and BB/L021994/1). NG is supported by the European Molecular Biology Laboratory.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Acknowledgements

We would like to thank all participants at the IARPA meeting in Washington D.C. on 27–28 April 2016 (https://www.src.org/calendar/e006043/) for an interdisciplinary discussion, during which the need for a unified vocabulary to foster understanding within this new field was in evidence. We thank in particular Luis Ceze who chaired this discussion. This provided additional motivation for continuing and extending the glossary we had already put together, as reported during the meeting, and for writing this paper.

Faculty Opinions recommended

References

1. Church GM, Gao Y, Kosuri S: Next-generation digital information storage in DNA. Science. 2012; 337(6102): 1628. PubMed Abstract | Publisher Full Text
2. Goldman N, Bertone P, Chen S, et al.: Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature. 2013; 494(7435): 77–80. PubMed Abstract | Publisher Full Text | Free Full Text
3. Bornholt J, Lopez R, Carmean DM, et al.: A DNA-based archival storage system. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS ’16, New York, NY, USA, ACM. 2016; 44(2): 637–649. Publisher Full Text
4. Yazdi SM, Yuan Y, Ma J, et al.: A Rewritable, Random-Access DNA-Based Storage System. Sci Rep. 2015; 5: 14138. PubMed Abstract | Publisher Full Text | Free Full Text
5. Buschmann T, Bystrykh LV: Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics. 2013; 14: 272. PubMed Abstract | Publisher Full Text | Free Full Text
6. Langmead B, Trapnell C, Pop M, et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3): R25. PubMed Abstract | Publisher Full Text | Free Full Text
7. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14): 1754–1760. PubMed Abstract | Publisher Full Text | Free Full Text
8. Li R, Yu C, Li Y, et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009; 25(15): 1966–1967. PubMed Abstract | Publisher Full Text
9. Mardis E, McCombie WR: Library Quantification: Fluorometric Quantitation of Double-Stranded or Single-Stranded DNA Samples Using the Qubit System. Cold Spring Harb Protoc. 2017; 2017(6): pdb.prot094730. PubMed Abstract | Publisher Full Text
10. Schumacher B: Quantum coding. Phys Rev A. 1995; 51(4): 2738–2747. PubMed Abstract | Publisher Full Text
11. Zhirnov V, Zadegan RM, Sandhu GS, et al.: Nucleic acid memory. Nat Mater. 2016; 15(4): 366–370. PubMed Abstract | Publisher Full Text
12. Brazma A, Hingamp P, Quackenbush J, et al.: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001; 29(4): 365–371. PubMed Abstract | Publisher Full Text
13. Bustin SA, Benes V, Garson JA, et al.: The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009; 55(4): 611–622. PubMed Abstract | Publisher Full Text
14. Lee JA, Spidlen J, Boyce K, et al.: MIFlowCyt: the minimum information about a Flow Cytometry Experiment. Cytometry A. 2008; 73(10): 926–930. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (1)

Version 1

VERSION 1 PUBLISHED 10 Jan 2018

Reader Comment 22 May 2018

Bart Penders, Maastricht University, The Netherlands

22 May 2018

Reader Comment

Facilitating productive exchanges is an important requirement for any interdisciplinary endeavour.

The approach to limit a vocabulary to allow specific types of communication across epistemic and cultural boundaries would ... Continue reading Facilitating productive exchanges is an important requirement for any interdisciplinary endeavour.

The approach to limit a vocabulary to allow specific types of communication across epistemic and cultural boundaries would benefit from a quick visit to some older literature recommending, describing and theorising similar processes.

The controlled vocabulary, proposed in this paper, strongly resembles discussion is social theory and social studies of science. For instance, in 1997, Galison proposed the notion of *Trading zones* and described how they work [1]. They are areas where technical or scientific practices can become collective by allowing practitioners to use so-called pidgins. A pidgin is a simplified language, one that can be use by a diverse array of practitioners and which does not require full assimilation into a knowledge culture.

Trading zones host objects or elements that matter to many (disciplines). These elements may not be seen, described, conceptualised or understood in the same way. They can be described as boundary objects [2] occupying unique spaces on the boundary between disciplines allowing some form of communication to exist through them.

[1] Galison, Peter (1997) Image and Logic: A Material Culture of Microphysics. Chicago: University of Chicago Press.

[2] Star, Susan Leigh and James R. Griesemer (1989) “Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39.” Social Studies of Science 19: 387–420.
Facilitating productive exchanges is an important requirement for any interdisciplinary endeavour.

The approach to limit a vocabulary to allow specific types of communication across epistemic and cultural boundaries would benefit from a quick visit to some older literature recommending, describing and theorising similar processes.

The controlled vocabulary, proposed in this paper, strongly resembles discussion is social theory and social studies of science. For instance, in 1997, Galison proposed the notion of *Trading zones* and described how they work [1]. They are areas where technical or scientific practices can become collective by allowing practitioners to use so-called pidgins. A pidgin is a simplified language, one that can be use by a diverse array of practitioners and which does not require full assimilation into a knowledge culture.

Trading zones host objects or elements that matter to many (disciplines). These elements may not be seen, described, conceptualised or understood in the same way. They can be described as boundary objects [2] occupying unique spaces on the boundary between disciplines allowing some form of communication to exist through them.

[1] Galison, Peter (1997) Image and Logic: A Material Culture of Microphysics. Chicago: University of Chicago Press.

[2] Star, Susan Leigh and James R. Griesemer (1989) “Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39.” Social Studies of Science 19: 387–420.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Comment

Author details Author details

¹ Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK
² European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK

Emily E. Hesketh
Roles: Conceptualization, Writing – Original Draft Preparation, Writing – Review & Editing

Jossy Sayir
Roles: Writing – Review & Editing

Nick Goldman
Roles: Funding Acquisition, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

EEH and JS are supported by the UK's Biotechnology and Biological Sciences Research Council (BBSRC grants BB/L023741/1 and BB/L021994/1). NG is supported by the European Molecular Biology Laboratory.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 10 Jan 2018, 7:39

https://doi.org/10.12688/f1000research.13482.1

Copyright

© 2018 Hesketh EE et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Hesketh EE, Sayir J and Goldman N. Improving communication for interdisciplinary teams working on storage of digital information in DNA [version 1; peer review: 2 approved]. F1000Research 2018, 7:39 (https://doi.org/10.12688/f1000research.13482.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 10 Jan 2018

Views

20

Reviewer Report 29 Mar 2018

Jeffrey R. Sampson, Agilent Research Laboratories, Santa Clara, CA, USA

Approved

https://doi.org/10.5256/f1000research.14640.r31971

The paper by Hesketh et al., addresses the very important issue of facilitating productive communication among highly interdisciplinary teams. This impacts not only verbal communication among interdisciplinary members but also written communications in the form of simple messages and publications. ... Continue reading

The paper by Hesketh et al., addresses the very important issue of facilitating productive communication among highly interdisciplinary teams. This impacts not only verbal communication among interdisciplinary members but also written communications in the form of simple messages and publications. It is also well noted that during peer review of publications, there is often lacking a single person with the necessary vocabulary and domain knowledge to fully understand, evaluate and communicate a review of the work. The method of Hesketh et al. will clearly aid in this important process. Importantly, they have developed a smart approach to the problem that can be applied more broadly to other interdisciplinary teams that require the integration of disparate fields of science and technology such as life sciences and engineering. For example, the synthetic biology community has experienced this issue as it has developed and evolved over the past 15 or so years.

More specifically, Hesketh et al. not only set a good structure and context that the interdisciplinary team developing the DNA as a digital information storage media face, but also provides some solutions to critical problems. The first is creating a glossary of terms so that all disciplines involved can communicate with a common and known set of terms. Second, they have put forward the use of a “controlled vocabulary” where terms that are particular to the emerging interdisciplinary field are defined so as to enable all members to communicate precisely and thus reduce confusion that often occurs when terms have multiple meanings and/or field dependent meanings. Perhaps most importantly, Hesketh et al., have built their approach as a “living document” where the vocabulary and common vocabulary can be continuously updated by the interdisciplinary community as the community grows and evolves.

With respect to any additional comments or edits, I offer that the authors consider adding “Chemistry Terminology” to their glossary with specific attention to the chemical synthesis of DNA since this is the current method for DNA synthesis. Such terms could include; phosphoramidite, cycle yield, coupling efficiency, de-block step, oxidation step.

Given the importance, clarity and potential for broad applicability, I strongly recommend the paper for indexing.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Nucleic acid synthesis and measurement technologies, technology development and business strategy.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

21

Reviewer Report 26 Feb 2018

Robert Grass, Institute for Chemical and Bioengineering (ICB), Department of Chemistry and Applied Biosciences (D-CHAB), ETH Zurich (Swiss Federal Institute of Technology in Zurich), Zürich, Switzerland

Approved

https://doi.org/10.5256/f1000research.14640.r29657

The paper by E. Hesketh addresses very important problems of our current scientific landscape, and the ongoing movement to more interdisciplinary approaches:

Communication between scientists in a team
Peer Review

The paper by E. Hesketh addresses very important problems of our current scientific landscape, and the ongoing movement to more interdisciplinary approaches:

Communication between scientists in a team
Peer Review

The authors discuss these two topics using a currently evolving research topic: the storage of digital information in DNA; but the addressed problems have a significantly broader applicability, as individual research topics spread over more and more scientific disciplines, and especially because data and computer sciences are having a major impact on science (and the corresponding high-level mathematics are currently not integrated into e.g. life-science curricula).

For the communication for scientists within a team, the authors present an excellent glossary of terms for the scientific fields involved in DNA data storage - and the development, and open publication/distribution of such glossaries would bring benefit to many interdisciplinary projects. Instead of a locally managed glossary (as proposed), are more open approach (e.g. as an open Wikipedia) would be even more beneficial and further motivate others to participate stronger in updating the glossary. Additionally, some referencing within the glossary would be additionally valuable - as often background in understanding an individual term is required. (as standard within Wikipedia). If the authors have good reasons for a non-public (i.e. wiki) approach, theses should be discussed in the article, if not, the generation of a corresponding wiki would be certainly highly appreciated by the research community.

However, to completely solve the communication problems and misunderstandings in such projects, the authors touch a point of even higher importance: “misunderstandings can pass unnoticed”, so the question is what solutions are available to make team members aware of the danger of miscommunication and, implement sufficient effort for every individual in a given project to learn the details, wordings and backgrounds of the neighboring fields- the authors may want to further build on this observation, and potentially present approaches to ensure such awareness and openness (especially in teams involving specialists).

The second problem of interdisciplinary projects addressed is peer-review. The more detailed background of different scientific fields is required to judge the correctness of scientific work performed, the more difficult it is to find individuals as paper referees who cover all of this knowledge. A plain text summary, as presented by the authors as part of a solution is certainly a good start, but probably does not go far enough. In contrast to individuals working on an interdisciplinary project (as above), a journal referee does not have enough time to learn details and wordings of the other fields, and the review process gets somewhat superficial. A general understanding of the overall goals of a given paper (as per plain text summary) may help the referee to understand the article scope, but it will not help him to judge the scientific validity of the methods applied. The authors of the present manuscript somewhat touch on this, and a more explicit depiction of the problem may be valuable to a further discussion of future publishing/peer-review modes (e.g. post-publication review, open-review, various referees only refereeing part of articles).

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (1)

Version 1

VERSION 1 PUBLISHED 10 Jan 2018

Reader Comment 22 May 2018

Bart Penders, Maastricht University, The Netherlands

22 May 2018

Reader Comment

Facilitating productive exchanges is an important requirement for any interdisciplinary endeavour.

The approach to limit a vocabulary to allow specific types of communication across epistemic and cultural boundaries would ... Continue reading Facilitating productive exchanges is an important requirement for any interdisciplinary endeavour.

The approach to limit a vocabulary to allow specific types of communication across epistemic and cultural boundaries would benefit from a quick visit to some older literature recommending, describing and theorising similar processes.

The controlled vocabulary, proposed in this paper, strongly resembles discussion is social theory and social studies of science. For instance, in 1997, Galison proposed the notion of *Trading zones* and described how they work [1]. They are areas where technical or scientific practices can become collective by allowing practitioners to use so-called pidgins. A pidgin is a simplified language, one that can be use by a diverse array of practitioners and which does not require full assimilation into a knowledge culture.

Trading zones host objects or elements that matter to many (disciplines). These elements may not be seen, described, conceptualised or understood in the same way. They can be described as boundary objects [2] occupying unique spaces on the boundary between disciplines allowing some form of communication to exist through them.

[1] Galison, Peter (1997) Image and Logic: A Material Culture of Microphysics. Chicago: University of Chicago Press.

[2] Star, Susan Leigh and James R. Griesemer (1989) “Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39.” Social Studies of Science 19: 387–420.
Facilitating productive exchanges is an important requirement for any interdisciplinary endeavour.

The approach to limit a vocabulary to allow specific types of communication across epistemic and cultural boundaries would benefit from a quick visit to some older literature recommending, describing and theorising similar processes.

The controlled vocabulary, proposed in this paper, strongly resembles discussion is social theory and social studies of science. For instance, in 1997, Galison proposed the notion of *Trading zones* and described how they work [1]. They are areas where technical or scientific practices can become collective by allowing practitioners to use so-called pidgins. A pidgin is a simplified language, one that can be use by a diverse array of practitioners and which does not require full assimilation into a knowledge culture.

Trading zones host objects or elements that matter to many (disciplines). These elements may not be seen, described, conceptualised or understood in the same way. They can be described as boundary objects [2] occupying unique spaces on the boundary between disciplines allowing some form of communication to exist through them.

[1] Galison, Peter (1997) Image and Logic: A Material Culture of Microphysics. Chicago: University of Chicago Press.

[2] Star, Susan Leigh and James R. Griesemer (1989) “Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39.” Social Studies of Science 19: 387–420.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Comment

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 10 Jan 18	read	read

Robert Grass, ETH Zurich (Swiss Federal Institute of Technology in Zurich), Zürich, Switzerland
Jeffrey R. Sampson, Agilent Research Laboratories, Santa Clara, USA

Comments on this article

All Comments(1)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

20 Views

29 Mar 2018 | for Version 1

Jeffrey R. Sampson, Agilent Research Laboratories, Santa Clara, CA, USA

20 Views Cite this report Responses(0)

Approved

The paper by Hesketh et al., addresses the very important issue of facilitating productive communication among highly interdisciplinary teams. This impacts not only verbal communication among interdisciplinary members but also written communications in the form of simple messages and publications. It is also well noted that during peer review of publications, there is often lacking a single person with the necessary vocabulary and domain knowledge to fully understand, evaluate and communicate a review of the work. The method of Hesketh et al. will clearly aid in this important process. Importantly, they have developed a smart approach to the problem that can be applied more broadly to other interdisciplinary teams that require the integration of disparate fields of science and technology such as life sciences and engineering. For example, the synthetic biology community has experienced this issue as it has developed and evolved over the past 15 or so years.

More specifically, Hesketh et al. not only set a good structure and context that the interdisciplinary team developing the DNA as a digital information storage media face, but also provides some solutions to critical problems. The first is creating a glossary of terms so that all disciplines involved can communicate with a common and known set of terms. Second, they have put forward the use of a “controlled vocabulary” where terms that are particular to the emerging interdisciplinary field are defined so as to enable all members to communicate precisely and thus reduce confusion that often occurs when terms have multiple meanings and/or field dependent meanings. Perhaps most importantly, Hesketh et al., have built their approach as a “living document” where the vocabulary and common vocabulary can be continuously updated by the interdisciplinary community as the community grows and evolves.

With respect to any additional comments or edits, I offer that the authors consider adding “Chemistry Terminology” to their glossary with specific attention to the chemical synthesis of DNA since this is the current method for DNA synthesis. Such terms could include; phosphoramidite, cycle yield, coupling efficiency, de-block step, oxidation step.

Given the importance, clarity and potential for broad applicability, I strongly recommend the paper for indexing.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Nucleic acid synthesis and measurement technologies, technology development and business strategy.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

21 Views

26 Feb 2018 | for Version 1

Robert Grass, Institute for Chemical and Bioengineering (ICB), Department of Chemistry and Applied Biosciences (D-CHAB), ETH Zurich (Swiss Federal Institute of Technology in Zurich), Zürich, Switzerland

21 Views Cite this report Responses(0)

Approved

The paper by E. Hesketh addresses very important problems of our current scientific landscape, and the ongoing movement to more interdisciplinary approaches:

Communication between scientists in a team
Peer Review

The authors discuss these two topics using a currently evolving research topic: the storage of digital information in DNA; but the addressed problems have a significantly broader applicability, as individual research topics spread over more and more scientific disciplines, and especially because data and computer sciences are having a major impact on science (and the corresponding high-level mathematics are currently not integrated into e.g. life-science curricula).

For the communication for scientists within a team, the authors present an excellent glossary of terms for the scientific fields involved in DNA data storage - and the development, and open publication/distribution of such glossaries would bring benefit to many interdisciplinary projects. Instead of a locally managed glossary (as proposed), are more open approach (e.g. as an open Wikipedia) would be even more beneficial and further motivate others to participate stronger in updating the glossary. Additionally, some referencing within the glossary would be additionally valuable - as often background in understanding an individual term is required. (as standard within Wikipedia). If the authors have good reasons for a non-public (i.e. wiki) approach, theses should be discussed in the article, if not, the generation of a corresponding wiki would be certainly highly appreciated by the research community.

However, to completely solve the communication problems and misunderstandings in such projects, the authors touch a point of even higher importance: “misunderstandings can pass unnoticed”, so the question is what solutions are available to make team members aware of the danger of miscommunication and, implement sufficient effort for every individual in a given project to learn the details, wordings and backgrounds of the neighboring fields- the authors may want to further build on this observation, and potentially present approaches to ensure such awareness and openness (especially in teams involving specialists).

The second problem of interdisciplinary projects addressed is peer-review. The more detailed background of different scientific fields is required to judge the correctness of scientific work performed, the more difficult it is to find individuals as paper referees who cover all of this knowledge. A plain text summary, as presented by the authors as part of a solution is certainly a good start, but probably does not go far enough. In contrast to individuals working on an interdisciplinary project (as above), a journal referee does not have enough time to learn details and wordings of the other fields, and the review process gets somewhat superficial. A general understanding of the overall goals of a given paper (as per plain text summary) may help the referee to understand the article scope, but it will not help him to judge the scientific validity of the methods applied. The authors of the present manuscript somewhat touch on this, and a more explicit depiction of the problem may be valuable to a further discussion of future publishing/peer-review modes (e.g. post-publication review, open-review, various referees only refereeing part of articles).

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

[1] 1. Church GM, Gao Y, Kosuri S: Next-generation digital information storage in DNA. Science. 2012; 337(6102): 1628. PubMed Abstract | Publisher Full Text

[2] 2. Goldman N, Bertone P, Chen S, et al.: Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature. 2013; 494(7435): 77–80. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Bornholt J, Lopez R, Carmean DM, et al.: A DNA-based archival storage system. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS ’16, New York, NY, USA, ACM. 2016; 44(2): 637–649. Publisher Full Text

[4] 4. Yazdi SM, Yuan Y, Ma J, et al.: A Rewritable, Random-Access DNA-Based Storage System. Sci Rep. 2015; 5: 14138. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Buschmann T, Bystrykh LV: Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics. 2013; 14: 272. PubMed Abstract | Publisher Full Text | Free Full Text

[6] 6. Langmead B, Trapnell C, Pop M, et al.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3): R25. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14): 1754–1760. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Li R, Yu C, Li Y, et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009; 25(15): 1966–1967. PubMed Abstract | Publisher Full Text

[9] 9. Mardis E, McCombie WR: Library Quantification: Fluorometric Quantitation of Double-Stranded or Single-Stranded DNA Samples Using the Qubit System. Cold Spring Harb Protoc. 2017; 2017(6): pdb.prot094730. PubMed Abstract | Publisher Full Text

[10] 10. Schumacher B: Quantum coding. Phys Rev A. 1995; 51(4): 2738–2747. PubMed Abstract | Publisher Full Text

[11] 11. Zhirnov V, Zadegan RM, Sandhu GS, et al.: Nucleic acid memory. Nat Mater. 2016; 15(4): 366–370. PubMed Abstract | Publisher Full Text

[12] 12. Brazma A, Hingamp P, Quackenbush J, et al.: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001; 29(4): 365–371. PubMed Abstract | Publisher Full Text

[13] 13. Bustin SA, Benes V, Garson JA, et al.: The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009; 55(4): 611–622. PubMed Abstract | Publisher Full Text

[14] 14. Lee JA, Spidlen J, Boyce K, et al.: MIFlowCyt: the minimum information about a Flow Cytometry Experiment. Cytometry A. 2008; 73(10): 926–930. PubMed Abstract | Publisher Full Text | Free Full Text

Improving communication for interdisciplinary teams working on storage of digital information in DNA

Abstract

Keywords

Introduction

Interdisciplinary teams make significant advances in life sciences

Unambiguous communication can be challenging and misunderstandings can pass unnoticed

A glossary and controlled vocabulary for DNA-storage

Glossary

Controlled vocabulary

Improving review, accessibility and impact of interdisciplinary publications

Presentation can be improved by including a short plain-language summary

Box 1. Plain-language summary of 2.

Figure 1. Sample paragraphs from 4.

Conclusions

Competing interests

Grant information

Acknowledgements

References

Comments on this article Comments (1)

Open Peer Review

Comments on this article Comments (1)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated