Bergman CM and Haddrill PR. Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.12688/f1000research.6090.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
1Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK 2Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FL, UK
To contribute to our general understanding of the evolutionary forces that shape variation in genome sequences in nature, we have sequenced genomes from 50 isofemale lines and six pooled samples from populations of Drosophila melanogaster on three continents. Analysis of raw and reference-mapped reads indicates the quality of these genomic sequence data is very high. Comparison of the predicted and experimentally-determined Wolbachia infection status of these samples suggests that strain or sample swaps are unlikely to have occurred in the generation of these data. Genome sequences are freely available in the European Nucleotide Archive under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center.
Keywords
Drosophila melanogaster, Wolbachia pipientis, population genomics, population genetics, pool-seq, DNA-seq
Corresponding author:
Casey M. Bergman
Competing interests:
No competing interests were disclosed.
Grant information:
This work was supported by Human Frontier Science Program Young Investigator grant RGY0093/2012 to CMB and National Environmental Research Council grant NE/G013195/1 to PRH.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Whole genome shotgun sequences can now be generated easily using short-read sequencing technology for most organisms. Hundreds of resequenced genomes now exist for Drosophila melanogaster that can be used for population and genomic analysis in this model insect species (Lack et al., 2014). To contribute to the worldwide sampling of population genomic data in D. melanogaster, we have sequenced genomes of multiple isofemale lines from three populations collected on different continents reported in Verspoor & Haddrill (2011): Montpellier, France (FR, n=20), Athens, Georgia, USA (GA, n=15) and Accra, Ghana (GH, n=15). Pools of these same isofemale lines were also sequenced to be able compare results based on strain-specific sequencing to pooled sequencing. Strains sequenced here were chosen because isofemale lines exist in the Drosophila Species Stock Center and because their infection status for the Wolbachia pipientis bacterial endosymbiont had previously been determined (Verspoor & Haddrill, 2011).
Materials and methods
Isofemale strains were selected randomly from the full population samples reported in Verspoor & Haddrill (2011). Genomic DNA for isofemale lines was prepared by snap freezing females in liquid nitrogen, then extracting DNA using a standard phenol-chloroform extraction protocol with ethanol and ammonium acetate precipitation. DNA samples were generated for each isofemale lines using 50, 25, and 25 adult females for the FR, GA and GH populations, respectively.
For pooled samples, single adult females from each isofemale line were used to construct two samples for each population. The first pooled sample contains one fly from each of the same strains that were sequenced as isofemale lines (FR_pool_20, GA_pool_15, GH_pool_15). The second pooled sample contains one fly from all isofemale lines sampled for each population reported in Verspoor & Haddrill (2011) (FR_pool_39, GA_pool_30, GH_pool_32).
500 bp short-insert libraries using the Illumina Paired-End Sample Prep Kit (Part # 1005063) were constructed and 90 bp paired-end reads were generated using an Illumina HiSeq 2000 to an estimated coverage of ~50× per strain by BGI-Hong Kong. Forty-one samples were sequenced in single lanes shared typically with two other samples on a single run and 15 samples were sequenced using the same layout on two runs, generating 71 pairs of fastq files for the 56 samples. Data were generated over a total of seven sequencing runs. Raw data was filtered by BGI to remove read pairs where either read contained adapters or greater than 50% of bases with a quality value <= 5. No other trimming or filtering of the raw data was performed prior to submission using original filenames provided by BGI to the European Nucleotide Archive.
Dataset validation
To validate the quality of the raw sequence data, forward and reverse reads were analyzed using fastQC (version 0.11.2) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Forward and reverse read files for all runs had PASS status for most fastQC statistics. Per base sequence quality gave FAIL status for forward or reverse read files for all of the GA samples (which were sequenced together on one run) because of poor quality scores in the terminal 1–5 bp of the read. These poor quality termini can be easily trimmed and do not affect mappability, as the percent of reads mapped for these runs is very high (see Dataset 1).
Run
SampleName
PercentMapped
WolbachiaDepth
WolbachiaBreadth
PredictedInfectionStatus
ExperimentalInfectionStatus
NumRead1
NumBadRead1
LengthRead1
BasicStatisticsRead1
PerBaseSequenceQualityRead1
PerTileSequenceQualityRead1
PerSequenceQualityScoresRead1
PerBaseSequenceContentRead1
PerSequenceGCContentRead1
PerBaseNContentRead1
SequenceLengthDistributionRead1
SequenceDuplicationLevelsRead1
OverrepresentedSequencesRead1
AdapterContentRead1
KmerContentRead1
NumRead2
NumBadRead2
LengthRead2
BasicStatisticsRead2
PerBaseSequenceQualityRead2
PerTileSequenceQualityRead2
PerSequenceQualityScoresRead2
PerBaseSequenceContentRead2
PerSequenceGCContentRead2
PerBaseNContentRead2
SequenceLengthDistributionRead2
SequenceDuplicationLevelsRead2
OverrepresentedSequencesRead2
AdapterContentRead2
KmerContentRead2
ERR705945
FR23
98.23
157.2661506
1
infected
y
43632996
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43632996
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705946
FR24
98.26
253.4116299
1
infected
y
43346106
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43346106
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705947
FR25
98.03
0.153247956
0.136415409
uninfected
n
43752189
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43752189
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705948
FR26
98.07
0.089912146
0.080452318
uninfected
n
44941659
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44941659
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705949
FR28
97.99
243.9421572
1
infected
y
43685595
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43685595
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705950
FR29
97.87
0.081407529
0.07185147
uninfected
n
42950484
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42950484
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705951
FR30
98.23
0.002058714
0.002058714
uninfected
n
42649081
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42649081
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705952
FR31
98.02
0.009130119
0.003469051
uninfected
n
42446188
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42446188
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705953
FR32
98.15
0.011783572
0.010895406
uninfected
n
43804532
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43804532
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705954
FR33
98.19
0.032124608
0.030857829
uninfected
n
44688030
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44688030
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705955
FR34
98
0.077131557
0.067549468
uninfected
n
43183556
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43183556
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705956
FR35
98.3
315.1657004
1
infected
y
42738121
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42738121
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705957
FR37
98.09
0.005362121
0.003086493
uninfected
n
43621725
0
90
PASS
PASS
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43621725
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705958
FR38
98.11
0.00454179
0.003613397
uninfected
n
42989610
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42989610
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705959
FR39
97.53
0.02486153
0.002581674
uninfected
n
43653460
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43653460
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705960
FR42
98.13
0.149636136
0.13461226
uninfected
n
43513827
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43513827
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705961
FR44
98.25
176.3625032
1
infected
y
43999930
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43999930
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705962
FR45
98.08
231.0189465
1
infected
y
43534574
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43534574
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705963
FR46
98.25
194.166927
1
infected
y
43032033
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43032033
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705964
FR48
98.31
183.7333295
1
infected
y
44173007
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44173007
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705965
FRpool20
98.06
98.83144421
1
infected
na
44255580
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44255580
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705966
FRpool39
97.86
111.5827327
1
infected
na
43945723
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
FAIL
43945723
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
FAIL
ERR705967
GA01
98.18
150.1729217
0.999719195
infected
y
44032050
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44032050
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705968
GA02
97.95
215.2846917
1
infected
y
43935045
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43935045
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705969
GA03
97.97
209.1012232
0.999730238
infected
y
43868793
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43868793
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705970
GA04
98.01
147.6178483
0.999727082
infected
y
43288985
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43288985
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705971
GA05
97.77
0.101011057
0.090340453
uninfected
n
43348545
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43348545
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705972
GA06
97.98
0.007450808
0.007387705
uninfected
n
43688990
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43688990
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
FAIL
ERR705973
GA07
97.94
0.074818068
0.071163654
uninfected
n
43503291
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43503291
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705974
GA08
98.1
182.8769394
0.999798861
infected
n
36815786
0
90
PASS
PASS
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
36815786
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705975
GA08
97.9
14.52127022
0.999612709
infected
n
3132551
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
3132551
0
90
PASS
PASS
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705976
GA09
98.07
188.2177543
0.999725505
infected
y
42739085
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42739085
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705977
GA10
97.85
122.3655108
1
infected
y
42490121
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42490121
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705978
GA11
97.95
246.9037461
0.999725505
infected
y
42911088
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
42911088
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705979
GA12
97.96
197.0479893
0.999730238
infected
y
41533830
0
90
PASS
FAIL
FAIL
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
41533830
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705980
GA13
98.15
168.4276705
0.999719983
infected
y
43339064
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43339064
0
90
PASS
FAIL
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705981
GA15
98
0.11221882
0.105207362
uninfected
n
43191678
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43191678
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705982
GA17
98.02
0.116029412
0.108342759
uninfected
n
43375446
0
90
PASS
FAIL
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
43375446
0
90
PASS
FAIL
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705983
GApool15
98.1
60.85875647
1
infected
na
22168996
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22168996
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705984
GApool15
98.1
60.83926259
1
infected
na
22205575
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22205575
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705985
GApool30
98.08
52.71559543
1
infected
na
22154884
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22154884
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705986
GApool30
98.08
52.71422611
1
infected
na
22185750
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22185750
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705987
GH01
97.49
0.025129715
0.023159344
uninfected
n
22063039
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22063039
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705988
GH01
97.49
0.022915612
0.021481611
uninfected
n
22097255
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22097255
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705989
GH02
97.24
0.014733606
0.009604175
uninfected
n
22077514
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22077514
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705990
GH02
97.24
0.014497761
0.008524336
uninfected
n
22109134
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22109134
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705991
GH04
97.44
0.022074773
0.020114657
uninfected
n
22221140
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22221140
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705992
GH04
97.44
0.021787657
0.019542003
uninfected
n
22253515
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22253515
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705993
GH05
96.84
0.013619061
0.009383317
uninfected
n
22143765
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22143765
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705994
GH05
96.85
0.014468576
0.00899366
uninfected
n
22170215
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22170215
0
90
PASS
PASS
WARN
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705995
GH06
97.24
0.014600302
0.009973323
uninfected
n
22259211
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22259211
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705996
GH06
97.23
0.011142294
0.00801794
uninfected
n
22287021
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22287021
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705997
GH08
97.22
0.010342472
0.006264484
uninfected
n
22223535
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22223535
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705998
GH08
97.22
0.008726264
0.006128814
uninfected
n
22160305
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22160305
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR705999
GH09
97.36
0.005674477
0.004113483
uninfected
n
22217543
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22217543
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706000
GH09
97.35
0.00659025
0.00413478
uninfected
n
22150945
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22150945
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706001
GH10
97.35
0.009387261
0.004999282
uninfected
n
22258559
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22258559
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706002
GH10
97.34
0.008257729
0.003450909
uninfected
n
22189787
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22189787
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706003
GH11
97.49
0.007382184
0.007095068
uninfected
n
22202208
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22202208
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706004
GH11
97.48
0.007166847
0.00646089
uninfected
n
22122521
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22122521
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706005
GH12
97.49
0.006886042
0.006856857
uninfected
n
22103525
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22103525
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706006
GH12
97.48
0.006885253
0.006671494
uninfected
n
22030025
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22030025
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706007
GH14
97.3
0.008649752
0.005751778
uninfected
n
22276314
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22276314
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706008
GH14
97.28
0.010572796
0.008571663
uninfected
n
22209150
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22209150
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706009
GH15
97.48
59.98279515
0.999712884
infected
y
22233167
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22233167
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706010
GH15
97.48
60.15407617
0.999720772
infected
y
22163376
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
22163376
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706011
GH16
97.11
0.012396453
0.006327586
uninfected
n
44096323
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44096323
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706012
GH17
97.55
0.003690698
0.003533731
uninfected
n
44519157
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44519157
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706013
GH18
97.24
0.025455481
0.016151831
uninfected
n
44409433
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44409433
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
ERR706014
GHpool15
97.35
9.322955366
0.99934768
infected
na
44828941
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
44828941
0
90
PASS
PASS
PASS
PASS
WARN
PASS
PASS
PASS
PASS
PASS
PASS
PASS
Dataset 1.Descriptive statistics for validation of Drosophila melanogaster genome sequence data.
The PercentMapped column is obtained from the output of samtools flagstat using BAM files of mapped reads generated by bowtie2. The WolbachiaDepth, WolbachiaBreadth and PredictedInfectionStatus columns are obtained from the output of bedtools genomecov using BAM files of mapped reads generated by bowtie2. The ExperimentalInfectionStatus column is obtained from the results of Verspoor & Haddrill (2011). All other columns are obtained from the output of fastQC on the raw, unmapped reads.
To validate that the majority of the DNA sequenced is from the focal organism(s), untrimmed reads for each sample were mapped in paired-end mode using Bowtie (version 2.2.4) (Langmead & Salzberg, 2012) with default options to a “hologenome” reference generated by concatenating genome sequences for D. melanogaster (Genbank accession GCA_000001215.4) (Hoskins et al., 2015) and W. pipientis (Genbank accession AE017196) (Wu et al., 2004). Mapping to a hologenome was performed since many of these strains are known to be infected with Wolbachia (Verspoor & Haddrill, 2011). Unfiltered BAM files were used to estimate the proportion of reads in each sample that mapped to the expected target organisms using samtools flastat (version 0.1.19-44428cd) (Li et al., 2009). Greater than 96.8% of all reads in each run were mapped to the hologenome reference, indicating low levels of contaminating DNA in these data (Dataset 1).
Mapping to a hologenome also allowed us to verify if strain or sample swaps occurred in the process of producing these genome sequences by comparing predicted Wolbachia infection status with previously determined PCR-based infection status (Verspoor & Haddrill, 2011). Wolbachia infection status was predicted from genome sequences for each strain following a modified protocol from Richardson et al. (2012). Briefly, strains were predicted as "infected" when breadth of mapped read coverage was greater than 90% of the Wolbachia genome and mean depth of coverage was greater than one. Here, we compute breadth of coverage directly from the bedtools genomecov (version v2.22.0) (Quinlan & Hall, 2010) output rather than from a consensus sequence, as was done previously by Richardson et al. (2012). Predicted Wolbachia infection status matched experimentally determined infection status for 55/56 samples (98.2% concordance), indicating that strain or sample swaps are unlikely to have occurred during the generation of this dataset (Dataset 1). The only exception observed was for line GA08 from the Georgia population, which the WGS data indicates is infected while PCR data indicates it is uninfected. This observation can be explained by either PCR amplification failure for the GA08 stock in Verspoor & Haddrill (2011) or infection of the GA08 stock after data collection for Verspoor & Haddrill (2011). Further analysis of the Wolbachia infection status of this stock is warranted prior to use.
Data availability
Raw sequence data for the 56 samples reported here can be found in the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under accession ERP009059. Isofemale lines can be obtained from the Drosophila Species Stock Center (https://stockcenter.ucsd.edu) under accessions 14021-0231.139, 14021-0231.140, 14021-0231.141, 14021-0231.142, 14021-0231.143, 14021-0231.144, 14021-0231.145, 14021-0231.146, 14021-0231.147, 14021-0231.148, 14021-0231.149, 14021-0231.150, 14021-0231.151, 14021-0231.152, 14021-0231.153, 14021-0231.154, 14021-0231.155, 14021-0231.156, 14021-0231.157, 14021-0231.158, 14021-0231.183, 14021-0231.184, 14021-0231.185, 14021-0231.186, 14021-0231.187, 14021-0231.188, 14021-0231.189, 14021-0231.190, 14021-0231.191, 14021-0231.192, 14021-0231.193, 14021-0231.194, 14021-0231.195, 14021-0231.196, 14021-0231.197, 14021-0231.163, 14021-0231.164, 14021-0231.165, 14021-0231.166, 14021-0231.167, 14021-0231.168, 14021-0231.170, 14021-0231.172, 14021-0231.174, 14021-0231.176, 14021-0231.177, 14021-0231.178, 14021-0231.180, 14021-0231.181 and 14021-0231.182..
CMB and PRH conceived the study. CMB and PRH designed the experiments. PRH conducted the experiments. CMB conducted the data analysis. CMB prepared the first draft of the manuscript. All authors were involved in the revision of the draft manuscript and have agreed to the final content.
Competing interests
No competing interests were disclosed.
Grant information
This work was supported by Human Frontier Science Program Young Investigator grant RGY0093/2012 to CMB and National Environmental Research Council grant NE/G013195/1 to PRH.
I confirm that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Acknowledgments
We thank BGI-Hong Kong for assistance with genome sequencing and initial data quality control analysis and Daniel Halligan for assistance with data management.
References
Bergman CM, Haddrill PR:
Dataset 1 in “Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents”.
F1000Research.
2014. Data Source
Hoskins RA, Carlson JW, Wan KH, et al.:
The Release 6 reference sequence of the Drosophila melanogaster genome.
Genome Res.
2015. PubMed Abstract
| Publisher Full Text
Lack J, Cardeno C, Crepeau M, et al.:
The Drosophila Genome Nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population.
Genetics.
2015. Publisher Full Text
Richardson MF, Weinert LA, Welch JJ, et al.:
Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster.
PLoS Genet.
2012; 8(12): e1003129. PubMed Abstract
| Publisher Full Text
| Free Full Text
Verspoor RL, Haddrill PR:
Genetic diversity, population structure and Wolbachia infection status in a worldwide sample of Drosophila melanogaster and D. simulans populations.
PLoS One.
2011; 6(10): e26318. PubMed Abstract
| Publisher Full Text
| Free Full Text
Wu M, Sun LV, Vamathevan J, et al.:
Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements.
PLoS Biol.
2004; 2(3): E69. PubMed Abstract
| Publisher Full Text
| Free Full Text
1
Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK 2
Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FL, UK
This work was supported by Human Frontier Science Program Young Investigator grant RGY0093/2012 to CMB and National Environmental Research Council grant NE/G013195/1 to PRH.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Bergman CM and Haddrill PR. Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.12688/f1000research.6090.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.
Share
Open Peer Review
Current Reviewer Status:
?
Key to Reviewer Statuses
VIEWHIDE
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations
A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Nuzhdin S and Kao J. Reviewer Report For: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6521.r7868)
The authors have presented a succinct, but detailed description of sequence data for several populations that will certainly be useful to the Drosophila community. I see no major flaws in the manuscript. However, as a minor suggestion, it may be
... Continue reading
The authors have presented a succinct, but detailed description of sequence data for several populations that will certainly be useful to the Drosophila community. I see no major flaws in the manuscript. However, as a minor suggestion, it may be useful for readers if the authors update their Introduction to not only place the populations in the context of migration history, but perhaps to also briefly list the geographical areas covered by other sequence resources to clearly illustrate how their dataset adds onto the currently available resources.
Competing Interests: No competing interests were disclosed.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Nuzhdin S and Kao J. Reviewer Report For: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6521.r7868)
Dworkin I. Reviewer Report For: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6521.r7634)
This article primarily summaries the generation of a large set of resequenced Drosophila strains from three populations (Ghana, France and the US). Sequencing was done both individually for each isofemale strain, as well as in sequenced pools for each of
... Continue reading
This article primarily summaries the generation of a large set of resequenced Drosophila strains from three populations (Ghana, France and the US). Sequencing was done both individually for each isofemale strain, as well as in sequenced pools for each of three populations. While the primary goal of this research appears to be to provide the community with these additional genomic resources, the researchers were also particularly interested in examining Wolbachia infection status in the strains. Given that all raw data has been made available, it is likely that will provide an important useful resource for genomic analyses.
A few minor comments: Some comparison of mapping quality for the pooled sequences (as compared to the individual isofemale strains) would have been useful.
Some explanation as to why the number of individuals used for the three different sequencing pools differed would have also been helpful to understand the provenance of the data.
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Dworkin I. Reviewer Report For: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6521.r7634)
Pool JE. Reviewer Report For: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6521.r7534)
The authors' data will add value to Drosophila population genomic resources. I see no technical flaws in the manuscript. If the authors see fit, they could a bit more context to the data. For example, they could note that a
... Continue reading
The authors' data will add value to Drosophila population genomic resources. I see no technical flaws in the manuscript. If the authors see fit, they could a bit more context to the data. For example, they could note that a mosaic of homozygous and heterozygous regions may be expected from the isofemale line genomes. Optionally, they could also briefly put these three populations in historical context (i.e. that the species originated from sub-Saharan Africa but perhaps not western Africa specifically, that it expanded out of sub-Saharan Africa with a population bottleneck, and that North American populations are thought to have both European and African ancestry). The France and Ghana samples sequenced here may prove useful for identifying population ancestry in North American and other admixed populations.
Trivial edits:
Methods paragraph 1: “each isofemale lines” (delete final “s”)
References - from title of Lack et al. 2015, delete second “genomes”. Update precise author information.
Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Pool JE. Reviewer Report For: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents. [version 1; peer review: 3 approved]. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6521.r7534)
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations -
A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
How to fix it
Save downloaded CSV file
Open spreadsheet program (e.g. Excel)
Click the ‘Data’ tab at the top
Click the ‘From text’ icon (top left)
Browse for downloaded CSV file, click ‘Import’
Ensure ‘Delimited’ radio button is selected, click ‘Next’
Check one of the appropriate delimiter checkboxes (you can visualize the formatting by looking at the data preview below these options)
Bergman CM and Haddrill PR. Dataset 1 in: Strain-specific and pooled genome sequences for populations of Drosophila melanogaster from three continents.. F1000Research 2015, 4:31 (https://doi.org/10.5256/f1000research.6090.d42636)
Adjust parameters to alter display
View on desktop for interactive features
Includes Interactive Elements
View on desktop for interactive features
Competing Interests Policy
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Examples of 'Non-Financial Competing Interests'
Within the past 4 years, you have held joint grants, published or collaborated with any of the authors of the selected paper.
You have a close personal relationship (e.g. parent, spouse, sibling, or domestic partner) with any of the authors.
You are a close professional associate of any of the authors (e.g. scientific mentor, recent student).
You work at the same institute as any of the authors.
You hope/expect to benefit (e.g. favour or employment) as a result of your submission.
You are an Editor for the journal in which the article is published.
Examples of 'Financial Competing Interests'
You expect to receive, or in the past 4 years have received, any of the following from any commercial organisation that may gain financially from your submission: a salary, fees, funding, reimbursements.
You expect to receive, or in the past 4 years have received, shared grant support or other funding with any of the authors.
You hold, or are currently applying for, any patents or significant stocks/shares relating to the subject matter of the paper you are commenting on.
Stay Updated
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Comments on this article Comments (0)