Keywords
Barcode, DNA taxonomy, FISH-BOL, peer review, scientific publishing
This article is included in the Phylogenetics collection.
Barcode, DNA taxonomy, FISH-BOL, peer review, scientific publishing
DNA sequence data has become widely accepted as a useful tool for taxonomic determination and discovery1–3. But the potential pitfalls of DNA taxonomy in operation have been forewarned for some time4–10.
The DNA barcode itself is simply a standard region selected to facilitate comparison11. A library built of many such sequences and based on a gene evolving at a rate that minimizes variation within and maximizes variation between species becomes a powerful taxonomic resource5. But the journey from DNA barcode sequence to species determination still requires critical application, particularly when applied to taxa or regions that are not currently well represented in sequence databases.
Falade et al.12 obtained DNA sequences for sixteen individual fish from Southwest Nigeria, a region with relatively sparse coverage in sequence databases. Such data are valuable because broad geographic and taxonomic representation provide insight into genetic diversity within taxonomic groups and help us to refine hypotheses of species circumscription and phylogenetic relationships.
Falade et al.,12 sequenced each specimen for the standard animal DNA barcode region cytochrome oxidase I (COI) and a region of the 16S mitochondrial ribosome. The authors queried their sequences against both the BOLD Systems (RRID: SCR_004278; boldsystems.org/index.php/IDS_OpenIdEngine) and NCBI GenBank (RRID: SCR_004860; BLASTN, RRID: SCR_001598; blast.ncbi.nlm.nih.gov/Blast.cgi) databases (because BOLD does not include 16S, these sequences were only compared to GenBank). Although the authors claim that “this resulted in straightforward identification”, we take a more nuanced view on their results.
The BOLD identification engine and BLASTN comparison with GenBank work differently and were created for different purposes13–15; only BOLD is specifically intended to be used as a taxonomic identification tool, while BLASTN assesses sequence similarity. BLASTN will always return the most similar sequences in GenBank. BOLD is more discriminating, since it is limited to a handful of specific loci and uses similarity thresholds to assess whether or not a query sequence can be matched to identified sequences in the database with high confidence. BOLD will alert the user when it determines that no confident identification could be made. DNA-based identification is complicated by the fact that both BOLD and GenBank include misidentified sequences16.
BOLD failed to identify with confidence any of the sixteen COI sequences. Eight were classified as probably belonging to one of a handful of possible species, while the rest received no hit. From this, we infer that Falade et al. made their taxonomic determinations based almost entirely on BLASTN results. As reported (Table 1), all but one of these were scored as 98–99% identical to their top GenBank hit with the remaining sequence (KX231778; Coptodon_zilli_odooba_1) scoring 86% identical.
Top BOLD hit and BOLD identification note summarize results from BOLD. Top Blast hit and Sequence name specify the best match in GenBank (excluding the Falade et al. sequences) according to BLASTN, with the Blast metrics Query cover and Ident. See also Table 2 in Falade et al. Note that BOLD contains no 16S data, so these sequences are listed as NA (not applicable).
Accession no. | Locus | Specimen voucher no. | Top BOLD hit | BOLD identification note | Top Blast hit | Sequence name | Query cover | Ident | Blast note |
---|---|---|---|---|---|---|---|---|---|
KX231778 | COI | Coptodon_zilli_odooba_1 | No Hit | JX173760.1 | Tilapia zillii isolate MAB08 cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial | 76 | 86 | ||
KX231779 | COI | Coptodon_zilli_odooba_2 | Top Hit: Chordata - Cichliformes - Tilapia zillii (99.65%) | A species level match could not be made, the queried specimen is likely to be one of the following: Tilapia zillii, Coptodon zillii, Coptodon sp., Oreochromis mossambicus, Coptodon rendalli, Tilapia guineensis | KM658974.1 | Coptodon zillii mitochondrion, complete genome | 87 | 99 | |
KX231780 | COI | Coptodon_zilli_odooba_3 | Top Hit: Chordata - Cichliformes - Tilapia zillii (99.44%) | A species level match could not be made, the queried specimen is likely to be one of the following: Tilapia zillii, Coptodon zillii, Coptodon sp., Oreochromis mossambicus, Coptodon rendalli | KM658974.1 | Coptodon zillii mitochondrion, complete genome | 88 | 99 | |
KX231781 | COI | Sarotherodon_ melanotheron_odooba_4 | No Hit | JF894132.1 | Sarotherodon melanotheron mitochondrion, complete genome | 92 | 98 | ||
KX231782 | COI | Sarotherodon_ melanotheron_odooba_5 | No Hit | JF894132.1 | Sarotherodon melanotheron mitochondrion, complete genome | 92 | 98 | ||
KX231783 | COI | Clarias_gariepinus_ odooba_6 | No Hit | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 92 | 99 | ||
KX231784 | COI | Clarias_gariepinus_ odooba_7 | Top Hit: Chordata - Siluriformes - Clarias gariepinus (99.62%) | A species level match could not be made, the queried specimen is likely to be one of the following: Clarias gariepinus, Clarias sp. NM-2010 | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 90 | 99 | |
KX231785 | COI | Clarias_gariepinus_ odooba_8 | No Hit | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 92 | 99 | ||
KX231786 | COI | Clarias_gariepinus_ odooba_9 | No Hit | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 92 | 99 | ||
KX231787 | COI | Clarias_gariepinus_ odooba_10 | Top Hit: Chordata - Siluriformes - Clarias gariepinus (100%) | A species level match could not be made, the queried specimen is likely to be one of the following: Clarias gariepinus, Clarias sp. NM-2010 | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 92 | 99 | |
KX231788 | COI | Clarias_gariepinus_ odooba_11 | No Hit | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 90 | 98 | ||
KX231789 | COI | Clarias_gariepinus_ asejire_12 | No Hit | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 90 | 98 | ||
KX231790 | COI | Clarias_gariepinus_ asejire_13 | Top Hit: Chordata - Siluriformes - Clarias gariepinus (99.84%) | A species level match could not be made, the queried specimen is likely to be one of the following: Clarias gariepinus, Clarias sp., Clarias magur, Clarias cf. stappersii, Clarias ngamensis | JQ699203.1 | Clarias gariepinus isolate CLGP5 cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial | 93 | 99 | |
KX231791 | COI | Clarias_gariepinus_ asejire_14 | Top Hit: Chordata - Siluriformes - Clarias gariepinus (100%) | A species level match could not be made, the queried specimen is likely to be one of the following: Clarias gariepinus, Clarias sp., Clarias magur, Clarias cf. stappersii, Clarias ngamensis | KX619412.1 | Clarias gariepinus cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial | 93 | 99 | |
KX231792 | COI | Clarias_gariepinus_ asejire_15 | Top Hit: Chordata - Siluriformes - Clarias gariepinus (100%) | A species level match could not be made, the queried specimen is likely to be one of the following: Clarias gariepinus, Clarias sp., Clarias magur, Clarias cf. stappersii, Clarias ngamensis | JQ699201.1 | Clarias gariepinus isolate CLGP3 cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial | 93 | 99 | |
KX231793 | COI | Clarias_gariepinus_ asejire_16 | No Hit | KX619412.1 | Clarias gariepinus cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial | 92 | 99 | ||
KX243276 | 16S | Coptodon_zilli_odooba_1 | NA | KM658974.1 | Coptodon zillii mitochondrion, complete genome | 93 | 99 | Top three hits from source publication, fourth hit reported | |
KX243277 | 16S | Coptodon_zilli_odooba_2 | NA | KM658974.1 | Coptodon zillii mitochondrion, complete genome | 93 | 99 | Top hit this sequence, second hit reported | |
KX243278 | 16S | Coptodon_zilli_odooba_3 | NA | GQ168017.1 | Tilapia aff. zillii 'Kisangani' isolate J72 16S ribosomal RNA gene, partial sequence; mitochondrial | 90 | 99 | Top three hits from source publication, fourth hit reported | |
KX243279 | 16S | Sarotherodon_ melanotheron_odooba_4 | NA | JF894132.1 | Sarotherodon melanotheron mitochondrion, complete genome | 93 | 99 | Top two hits from source publication, third hit reported | |
KX243280 | 16S | Sarotherodon_ melanotheron_odooba_5 | NA | JF894132.1 | Sarotherodon melanotheron mitochondrion, complete genome | 89 | 99 | Top two hits from source publication, third hit reported | |
KX243281 | 16S | Clarias_gariepinus_ odooba_6 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 97 | 99 | Top three hits from source publication, fourth hit reported | |
KX243282 | 16S | Clarias_gariepinus_ odooba_7 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 94 | 99 | Top six hits from source publication, seventh hit reported | |
KX243283 | 16S | Clarias_gariepinus_ odooba_8 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 94 | 99 | Top four hits from source publication, fifth hit reported | |
KX243284 | 16S | Clarias_gariepinus_ odooba_9 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 94 | 99 | Top six hits from source publication, seventh hit reported | |
KX243285 | 16S | Clarias_gariepinus_ odooba_10 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 84 | 99 | Top four hits from source publication, fifth hit reported | |
KX243286 | 16S | Clarias_gariepinus_ odooba_11 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 92 | 93 | Top two hits from source publication, third hit reported | |
KX243287 | 16S | Clarias_gariepinus_ asejire_12 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 51 | 95 | Top six hits from source publication, seventh hit reported | |
KX243288 | 16S | Clarias_gariepinus_ asejire_13 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 94 | 99 | Top four hits from source publication, fifth hit reported | |
KX243289 | 16S | Clarias_gariepinus_ asejire_14 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 94 | 99 | Top five hits from source publication, sixth hit reported | |
KX243290 | 16S | Clarias_gariepinus_ asejire_15 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 84 | 99 | Top four hits from source publication, fifth hit reported | |
KX243291 | 16S | Clarias_gariepinus_ asejire_16 | NA | KT001082.1 | Clarias gariepinus mitochondrion, complete genome | 92 | 93 | Top two hits from source publication, third hit reported |
To view the results in context, we downloaded from BOLD all COI sequences identified as one of the three species specified by Falade et al. [search ‘Taxonomy’ for Clarias gariepinus, Sarotherodon melanotheron, and Coptodon zillii (the latter also under the synonym Tilapia zillii)]. These sequences were combined with the Falade et al. data and initially aligned using MAFFT version 7.18717 with manual adjustments made using Mesquite version 3.1018 (mesquiteproject.wikispaces.com/). A phylogenetic analysis was performed using RAxML version 8.2.819. Initial alignment and phylogenetic analysis were performed through the CIPRES Science Gateway version 3.320 (RRID: SCR_008439; phylo.org/). Alignment required reversing or reverse-complementing some of the sequences from Falade et al. The problematic sequence KX231778 could not be satisfactorily aligned with the others and had to be excluded from the tree. The remaining COI sequences did cluster with other GenBank sequences in such a way as to suggest the remaining taxonomic determinations reported by Falade et al. are credible.
Another anomalous sequence is KX243287 (Clarias_gariepinus_asejire_12), a 16S sequence approximately twice the length of the others. We have no explanation for this.
The evidence presented by Falade et al. is not sufficient to determine at least the COI sequence KX231778. The method applied by Falade et al. made it nearly impossible to fail to obtain a taxonomic name for each sequence. This is a scientific flaw, and an example of the uncritical application of DNA taxonomy.
This paper was discussed as part of a regular journal discussion group offered by the Endless Forms research group at Naturalis Biodiversity Center, which involves students in the Evolution, Biodiversity, and Conservation program at Leiden University. Similar journal-article-based discussion groups can be found at many universities and Natural History Museums. We support the rationale behind open review journals (blog.f1000research.com/2014/05/21/what-is-open-peer-review/) and therefore decided to share the sense of our discussion with the broader community. We would like to encourage other journal discussion groups to include open review articles in their literature discussions, and consider sharing summaries of their discussions as article comments. Healthy science literature depends on a robust pool of potential reviewers21. We see journal discussion groups as an untapped resource for providing feedback on scientific literature, and also as incubators for developing student-scientists into constructive and rigorous peer reviewers.
F1000Research: Dataset 1. Aligned COI sequence data, 10.5256/f1000research.9829.d14138322
F1000Research: Dataset 2. Phylogenetic tree, 10.5256/f1000research.9829.d14138423
MS, JM, IvR, MZK, and DS conceived the study and outlined major points. IvR and JM analyzed the data and wrote initial drafts of the manuscript. All authors were involved in the revision of the draft manuscript and have agreed to the final content.
The Endless Forms Research Group (Naturalis) budget footed the bill for the journal club drinks at Meneer Jansen in Leiden. IvR is supported by the 'Nederlandse organisatie voor Wetenschappelijk Onderzoek' (NWO Open Programme 824.14.014).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 09 Nov 16 |
read | read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)