Malaghan missions for munching MinIONs
Malaghan missions for munching MinIONs
[version 1; not peer reviewed]No competing interests were disclosed
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
All commenters must hold a formal affiliation as per our Policies. The information that you give us will be displayed next to your comment.
User comments must be in English, comprehensible and relevant to the article under discussion. We reserve the right to remove any comments that we consider to be inappropriate, offensive or otherwise in breach of the User Comment Terms and Conditions. Commenters must not use a comment for personal attacks. When criticisms of the article are based on unpublished data, the data should be made available.
Tēnā koutou i tēnei ata
Ko Kurahaupo te waka
Ko Tapuae-o-uenuku te maunga
Ko Wairau te awa
Ko Rangitāne te iwi
Ko Moa tōku whanau
Nō Pōneke ahau
Kei Karori tōku kāinga
He kai pūtaiao ahau
Ko Rawiri tōku ingoa
No Reira
Tēnā koutou, tēnā koutou, tēnā koutou kotou katoa
For those who don't know me, my name is David Eccles, and for almost the last five years, I've been doing DNA sequencing on the Oxford Nanopore Technologies' MinION sequencer here at the Malaghan Institute of Medical Research.
Nanopore Overview
I'd like to start off by giving an overview of what the nanopore sequencer is. In a physical sense, the active part of the sequencer is a consumable flow cell, shown here on the left. On the bottom of the flow cell (hidden in this picture) is a chunk of silicon that contains electrical current sensing circuits and circuits to enable the transfer of those signals over a USB connection. On the top of the flow cell is an array of 2048 sequencing wells. 512 of these wells can be hooked up to the sensing circuits at any one time, and those circuits will take snapshots of the electrical state of the wells at 4 kHz, or 4 thousand snapshots per second.
Each well is covered with a delicate, fluid-impermeable polymer membrane, and embedded into the membrane are purpose-built (or more correctly, purpose-mutated) protein pores that have a size of a few tens of nanometres across. An electrical potential is set up across either side of the membrane, which encourages a flow of ions through the nanopore. If the sequencer is working well, then most the ions flowing through are the polymer templates that you're trying to sequence.
Hydroelectric Sequencing
After Doctor Divya Mirrington gave her presentations during the Nanopore workshop at the end of February this year, I realised that my own conceptual model of how sequencing works was in need of a bit of fixing. The sequencer isn't measuring a shadow of the things that are stuck inside the nanopore; it's measuring the flow of current through
the nanopore.
Imagine the flow cell as a bit like a small hydro dam with a reservoir at the back of it. This reservoir represents the potential sequencing energy stored by ions on the input side of the flow cell membrane. The sequencer measures the flow of ions through holes in the membrane, so a base-level flow of current is necessary to carry out sequencing. Complex polymers (like DNA) interrupt this flow in a complex way, and produce a characteristic change in current that is dependent on the physical and electrochemical structure of the polymer as it moves through the sequencing pore.
Over the course of a sequencing run, the ions on the input side deplete, and the voltage applied to the flow cell needs to be adjusted to compensate in order to maintain the same base flow rate. Eventually, the input side will be depleted to such a degree that no amount of voltage compensation will be sufficient to keep ions flowing fast enough, which stops the sequencing process.
If you want to get the most out of a nanopore flow cell, then this flow should be maintained for as long as possible without being blocked completely. One of the best ways to do this is to keep feeding the pores with long polymers. If a flow cell is sequencing a large number of long DNA strands at the start of a sequencing run, it will produce a higher yield over the course of a run compared to one that's sequencing short DNA strands. This is because there's some switching time involved between sequencing different strands, and that gives smaller molecules an opportunity to sneak through. Unfortunately, really long DNA tends to knot up and block the pores, and this can be sample-dependent, so there's a bit of tweaking involved in trying to find a happy medium to optimise sequencing yield.
Energy (ATP) Depletion
Going back to the physical model, you might notice that there's a helicase protein that is attached to the DNA sequence. The act of unwinding and ratcheting the DNA through the nanopore is an ATP-dependant process, so in addition to a supply of ions, the input side of the flow cell membrane also needs a supply of energy in the form of ATP. Adapter sequences containing this helicase are joined onto the DNA during sample preparation, and (at the moment) consume energy even when they're not bound to the sequencing pores. This means that it's actually possible to overload a flow cell and deplete it of energy, even before it's been depleted of ions required to maintain the flow of current for sequencing.
London Calling 2019
For those who don't know, I went on a conference a couple of weeks ago. This was the yearly Nanopore sequencing conference in London, and doubles as a trade show for ONT to demonstrate all their newest gadgets.
I took a picture of ONT's upcoming Mk1c sequencer, which is a screen, sequencer, and basecaller all built into one package - $5,000 USD. Also shown is their flongle adapter, and the flongle flow cells, which allow cheaper sequencing, about $100 USD per sequencing run. These things were talked about at previous ONT events, and are
available for ordering now on the Nanopore website, but might take a bit more time to get into customer's hands.
This year was a little different from previous years, in that weren't really any huge surprises in the presentations that Oxford Nanopore staff gave. Probably the closest to that was an unveiling of a 96-well sequencer concept, allowing for low-volume sequencing in a highly-multiplexed fashion.
A common theme of the research talks at the conference was looking for needles in haystacks. Things like assembling endogenous viruses out of whole-genomic reads, forensics, cell-free DNA, targeted sequencing, and sequencing the dark genome that other sequencers can't touch.
MinION Experts
Here's our nanopore sequencing team as it stands at the moment. As I mentioned previously, Doctor Mirrington led a Nanopore workshop earlier this year, and has trained up a few people to train others. If you want to get started with nanopore sequencing, ask one of these people to help you out. A few others attended Divya's one-day rapid start workshop, and participated in their own sequencing run. There was also a lot of interest from SBS, so one took part in the training days, and a few more of them joined in for the one-day workshop.
I'd like to also put forward a couple of honourable mentions in Olivier and Jodie, who braved the Oxford Nanopore sample preparation process at a time before there were any opportunities for training.
Research Groups
We have a few different research groups here at the Malaghan Institute, and the nanopore sequencer has been used, or will shortly be used, by all of them. Graham's group used nanopore sequencing for creating high-contiguity mitochondria and genome assemblies for Nippostrongylus brasiliensis. Ian's group has used nanopore sequencing
to look at plasmid sequences as a quality-control component of their CAR-T cell project. Franca's group has used nanopore sequencing in the past for looking at interferon alpha gene expression, and more recently for making sure that ordered plasmids are what they say on the tin. Mike's group are in the middle of a project investigating how mitochondria control cell function, particularly as it relates to tumour growth. The Hugh Green Cytometry Centre will be collaborating with a national team to try to assemble flow-sorted bird Z chromosomes. And Olivier's group has used nanopore sequencing for investigating metagenomic and 16s amplicon diversity in faecal samples.
Viral Sequencing (2015)
Something that frequently comes up when nanopore sequencing is mentioned are complaints about the error rate. In 2015, I worked with some researchers at ESR: Jing Wang, Nicole Moore and Richard Hall. They were in the MinION Early Access Program, and enlisted my help in adding the final touches to their paper on sequencing an Influenza genome. The different components were PCR-amplified from cDNA transcripts, and it was nice to see that proportional transcript coverage was a lot more consistent across the length of transcripts when compared to a similar extraction followed by Illumina sequencing. Another interesting thing was that we had reasonably high consensus concordance with Illumina sequencing; over 99% for all transcripts, and 100% agreement for two of the eight transcripts. I'd like to point out that this was in a paper published in 2015 using a now-obsolete version of the flow cells, with about 85% per-base average sequencing error.
Phylogenetic Analysis
Jumping forward to results published in a pre-print this year, David Greig gave a talk at the London Calling conference about the use of nanopore sequencing in a clinical outbreak setting. Using a combination of Illumina and Nanopore sequencing, they sequenced faecal samples from two cases: two children who were suffering from something called Haemolytic Uremic Syndrome.
The results of their investigation demonstrated that the two cases were not linked, and were associated with two different outbreaks.
Nanopore sequences did initially appear to have more variation from the reference strains than Illumina sequences, but they were sequencing unamplified DNA, and almost all of these variants were found to be methylated regions. In other words, the electrochemical
structure was affecting the base calling. After masking out methylated variants in the nanopore data, each Nanopore assembly had only 1 location that was different to the previously-assembled reference isolate.
The plasmids had some repetitive sequence, and all the variants identified by Illumina sequencing sat within these repetitive regions. These sequences couldn't be uniquely assigned to a genomic context because the Illumina sequences were too short, probably leading to a consensus error in the final variant calls.
In summary, David Greig found that nanopore sequencing is sufficient for fine-scale phylogenetic classification, and also seems to be more accurate than Illumina sequencing in some situations.
Novel Variant Discovery
Along the same vein, I'd like to highlight some work that was recently published by Melissa Leija-Salazar in the area of variant analysis.
She used the MinION to detect clinically-relevant heterozygous variants in the GBA gene. Defects in this gene have an established clinical link to a lysosomal storage disorder called Gaucher disease. GBA variants have also been found to be a significant risk factor for Parkinson's disease.
Starting with an amplified 9kb genomic region, Melissa's research group was able to detect a 55bp deletion in a recombinant allele from a Parkinson’s patient, which was missed by other sequencing methods. Using MinION sequences, they were able to detect all previously known coding missense mutations at the correct zygosity, and with some help from filtering software, were able to exclude most false variants that appeared to be the result of systematic base calling errors. In total, they ran 95 samples on MinION flow cells, and encountered one instance of a false negative, where a recombinant allele wasn't picked up by nanopore sequencing (possibly as a result of failed PCR primer binding to the recombinant region).
cDNA Mapping
So that's variant calling, but for cDNA mapping, we need single-molecule accuracy that's good enough to find a genomic context for individual sequences (or reads). But here, we can take advantage of the long reads that nanopore sequencing provides.
I'm showing here a few of the nanopore reads that were mapped to the reverse strand of the mitochondrial genome, with one line per read. Anything that matches exactly the reference sequence is show in a dark colour as red, green, blue, or yellow, and any differences are shown in lighter pastel colours. There's a vertical line of bright yellow in the middle here showing a single-nucleotide variant in an unannotated region near the end of the mitochondrial genome.
You can see that there is a bit of noise in the reads, but I think it's reasonable to conclude that all these reads match the mitochondrial reference.
Of course, this is only showing the specificity of nanopore sequencing to the reverse strand of the mitochondrial genome. In other words, how likely it is that reads mapped to the mitochondria actually come from the mitochondrial transcripts. This picture doesn't help much with demonstrating the sensitivity of nanopore sequencing. Perhaps there are other reads that have come off the sequencer that are from mitochondrial genes, but have so much base calling error that they're not mapping here, or are mapping to another
gene.
I don't yet have a complete answer for that, so if you've got an idea about what it might take to convince you that either the sensitivity or specificity of nanopore sequencing is good enough, please let me know.
In order to move ahead with generating and interpreting nanopore sequencing results, I've chosen to make an assumption that is that any sensitivity issues will be distributed randomly throughout the genome. Assuming this, I can concentrate on the proportion of reads mapping to each gene, without being too stuck up on the actual counts.
cDNA Sequencing (4T1 Project)
I want to expand a bit more on the mitochondrial project, because it's something that's taken a reasonable chunk of my time over the last couple of years. One of the main projects I've been working with Carole on has been cDNA sequencing from 4T1 breast cancer cell lines with and without mitochondrial DNA.
I talked in detail about methods behind this in a presentation last year, and we've had another couple of presentations about this project from other group members. For those who want a refresher on my previous presentation, there's a link on this slide:
https://doi.org/10.7490/f1000research.1115283.1
Since then, we've been firming up our numbers by doing additional replicate experiments, and I'm pleased to say that the results confirm the general idea of what we found out: if you knock out mitochondrial DNA, then you also knock out the expression of genes encoded on the mitochondrial genome. This is hopefully an observation that doesn't surprise too many of you.
Gene Expression Table
But for our 4T1 mitochondrial genome project, we didn't just sequence the mitochondrial transcripts; we did a survey of the entire 4T1 transcriptome, and those results have been a bit more surprising. In short, we found a number of transcripts that seem to be under mitochondrial control, or at least their expression is substantially increased or decreased when mitochondria are removed from our 4T1 cell lines. Here's a count table for a few genes, showing some cherry-picked results from that project.
We've now had a bit of time to think about these results, and have come up with a few ideas.
If you think about it a little bit, you might expect that gene transcripts involved in energy production might be altered. After all, if there's no mitochondrially-encoded proteins, it doesn't make much sense to create proteins that combine in the same energy complex. This turns out to be true to some extent, but there are some transcript expressions that don't change, or don't change as much as we expect they should. I haven't done a full survey of nuclear-encoded mitochondrial protein genes, but one such gene we've found is Atp5l (or ATP synthase, mitochondrial F0 complex, subunit G), and we expect we'll find more if we look deeper.
One interesting discovery is that there are at least two cellular differentiation markers that are differentially expressed between our ρ0 lines and our wildtype lines: Cd9, also known as Tetraspanin 29 according to the Jax database; and Cd81, also known as Tapa1, or Tetraspanin 28.
While I'm on the topic of Cd9, I should probably point out that we've got a few more immune-related genes in our differential expression list: Ccl2, Ccl5, Tslp, and Psmb8. We've followed up these results with qPCR, and that supports the observation of a substantial expression difference between wildtype and ρ0 cells.
I've picked out a few other genes here which have also been validated by qPCR: Gstk1, Mgp, Sumo3, and Serpinf1. We're still in the process of sorting out where they all fit in the grand scheme of things.
Transcript Table (mtDNA)
I want to finish up with a couple of slides on something that was a little bit of a puzzle for me, bringing me almost full circle back to the gene counts on the mitochondrial chromosome, which I presented on last year.
Ordinarily, the mitochondrial transcripts are one of the most expressed genes in the entire mouse transcriptome. Those transcripts are almost undetectable in our ρ0 samples, but not completely. If I am to assume that the read mapping process I've used is 100% specific, then I'm led to conclude that there is a sliver of mitochondrial gene expression in our ρ0 cell line. We did actually see this in our first comparison, where the ρ0-A sample was showing a read mapping to the Nd1 gene, but I'd basically ignored it because it was only a couple of reads from one gene out of the whole mitochondria. I figured that maybe there was something like a barcode spillover; a technical error introduced at an early stage of sample preparation, possibly even before we touched the sequencing kits. I would easily dismiss something like this with Illumina sequencing, because the occasional random read mapping is not unexpected.
But this consistency over three separate runs, in low counts within mitochondrial transcripts, has made me wonder again if this is actually a valid mapping.
A Sticky Situation
The thing that I really love about nanopore sequencing is that it allows you to really get your hands dirty and investigate possible reasons why odd things are happening. I talked about this a little bit in relation to Melissa Leija-Salazar and David Greig's variant investigations.
As it turns out, the Nd1 hits I was most worried about are most likely not valid hits to the mitochondrial genome, and this is probably the case with the other odd hits we've seen. To be more specific (or, I guess, more sensitive), these sequences appear to be chimeric reads composed of multiple sequences that slipped through the cracks of my chimeric read filter. Because a part of the read hit the mitochondrial genome, that entire read counted as a hit to the mitochondrial genome.
Here's one of those sequences, which I think fairly cleanly demonstrates the issue. The barcode sequences (in green here) are meant to flank the cDNA sequence, and sit just outside the strand switch primer and the polyA primer sequences. The bits of this sequence that are mapping to the mitochondrial genome are outside those flanking barcode regions, suggesting they're from a random chunk of DNA sequence that decided to hitch along for a ride through the nanopore.
So it looks like I've got a bit more cleaning to do, but I'm now more confident that our results are pretty robust, and we should be able to work through an observational / methods paper about our differential expression results over the next few months.
Acknowledgements
I'd like to thank all the poeple who have contributed to helping me through my exploration of nanopore sequencing, in particular Carole and Mike from the Berridge group, and the Twitter and Nanopore communities. I'd also like to put in a little plug for Darrell's team in helping to keep out lab in tip top shape.
And, of course, thank you for sitting here and being a wonderfully attentive audience.
Tēnā koutou i tēnei ata
Ko Kurahaupo te waka
Ko Tapuae-o-uenuku te maunga
Ko Wairau te awa
Ko Rangitāne te iwi
Ko Moa tōku whanau
Nō Pōneke ahau
Kei Karori tōku kāinga
He kai pūtaiao ahau
Ko Rawiri tōku ingoa
No Reira
Tēnā koutou, tēnā koutou, tēnā koutou... READ MORE
Tēnā koutou i tēnei ata
Ko Kurahaupo te waka
Ko Tapuae-o-uenuku te maunga
Ko Wairau te awa
Ko Rangitāne te iwi
Ko Moa tōku whanau
Nō Pōneke ahau
Kei Karori tōku kāinga
He kai pūtaiao ahau
Ko Rawiri tōku ingoa
No Reira
Tēnā koutou, tēnā koutou, tēnā koutou kotou katoa
For those who don't know me, my name is David Eccles, and for almost the last five years, I've been doing DNA sequencing on the Oxford Nanopore Technologies' MinION sequencer here at the Malaghan Institute of Medical Research.
Nanopore Overview
I'd like to start off by giving an overview of what the nanopore sequencer is. In a physical sense, the active part of the sequencer is a consumable flow cell, shown here on the left. On the bottom of the flow cell (hidden in this picture) is a chunk of silicon that contains electrical current sensing circuits and circuits to enable the transfer of those signals over a USB connection. On the top of the flow cell is an array of 2048 sequencing wells. 512 of these wells can be hooked up to the sensing circuits at any one time, and those circuits will take snapshots of the electrical state of the wells at 4 kHz, or 4 thousand snapshots per second.
Each well is covered with a delicate, fluid-impermeable polymer membrane, and embedded into the membrane are purpose-built (or more correctly, purpose-mutated) protein pores that have a size of a few tens of nanometres across. An electrical potential is set up across either side of the membrane, which encourages a flow of ions through the nanopore. If the sequencer is working well, then most the ions flowing through are the polymer templates that you're trying to sequence.
Hydroelectric Sequencing
After Doctor Divya Mirrington gave her presentations during the Nanopore workshop at the end of February this year, I realised that my own conceptual model of how sequencing works was in need of a bit of fixing. The sequencer isn't measuring a shadow of the things that are stuck inside the nanopore; it's measuring the flow of current through
the nanopore.
Imagine the flow cell as a bit like a small hydro dam with a reservoir at the back of it. This reservoir represents the potential sequencing energy stored by ions on the input side of the flow cell membrane. The sequencer measures the flow of ions through holes in the membrane, so a base-level flow of current is necessary to carry out sequencing. Complex polymers (like DNA) interrupt this flow in a complex way, and produce a characteristic change in current that is dependent on the physical and electrochemical structure of the polymer as it moves through the sequencing pore.
Over the course of a sequencing run, the ions on the input side deplete, and the voltage applied to the flow cell needs to be adjusted to compensate in order to maintain the same base flow rate. Eventually, the input side will be depleted to such a degree that no amount of voltage compensation will be sufficient to keep ions flowing fast enough, which stops the sequencing process.
If you want to get the most out of a nanopore flow cell, then this flow should be maintained for as long as possible without being blocked completely. One of the best ways to do this is to keep feeding the pores with long polymers. If a flow cell is sequencing a large number of long DNA strands at the start of a sequencing run, it will produce a higher yield over the course of a run compared to one that's sequencing short DNA strands. This is because there's some switching time involved between sequencing different strands, and that gives smaller molecules an opportunity to sneak through. Unfortunately, really long DNA tends to knot up and block the pores, and this can be sample-dependent, so there's a bit of tweaking involved in trying to find a happy medium to optimise sequencing yield.
Energy (ATP) Depletion
Going back to the physical model, you might notice that there's a helicase protein that is attached to the DNA sequence. The act of unwinding and ratcheting the DNA through the nanopore is an ATP-dependant process, so in addition to a supply of ions, the input side of the flow cell membrane also needs a supply of energy in the form of ATP. Adapter sequences containing this helicase are joined onto the DNA during sample preparation, and (at the moment) consume energy even when they're not bound to the sequencing pores. This means that it's actually possible to overload a flow cell and deplete it of energy, even before it's been depleted of ions required to maintain the flow of current for sequencing.
London Calling 2019
For those who don't know, I went on a conference a couple of weeks ago. This was the yearly Nanopore sequencing conference in London, and doubles as a trade show for ONT to demonstrate all their newest gadgets.
I took a picture of ONT's upcoming Mk1c sequencer, which is a screen, sequencer, and basecaller all built into one package - $5,000 USD. Also shown is their flongle adapter, and the flongle flow cells, which allow cheaper sequencing, about $100 USD per sequencing run. These things were talked about at previous ONT events, and are
available for ordering now on the Nanopore website, but might take a bit more time to get into customer's hands.
This year was a little different from previous years, in that weren't really any huge surprises in the presentations that Oxford Nanopore staff gave. Probably the closest to that was an unveiling of a 96-well sequencer concept, allowing for low-volume sequencing in a highly-multiplexed fashion.
A common theme of the research talks at the conference was looking for needles in haystacks. Things like assembling endogenous viruses out of whole-genomic reads, forensics, cell-free DNA, targeted sequencing, and sequencing the dark genome that other sequencers can't touch.
MinION Experts
Here's our nanopore sequencing team as it stands at the moment. As I mentioned previously, Doctor Mirrington led a Nanopore workshop earlier this year, and has trained up a few people to train others. If you want to get started with nanopore sequencing, ask one of these people to help you out. A few others attended Divya's one-day rapid start workshop, and participated in their own sequencing run. There was also a lot of interest from SBS, so one took part in the training days, and a few more of them joined in for the one-day workshop.
I'd like to also put forward a couple of honourable mentions in Olivier and Jodie, who braved the Oxford Nanopore sample preparation process at a time before there were any opportunities for training.
Research Groups
We have a few different research groups here at the Malaghan Institute, and the nanopore sequencer has been used, or will shortly be used, by all of them. Graham's group used nanopore sequencing for creating high-contiguity mitochondria and genome assemblies for Nippostrongylus brasiliensis. Ian's group has used nanopore sequencing
to look at plasmid sequences as a quality-control component of their CAR-T cell project. Franca's group has used nanopore sequencing in the past for looking at interferon alpha gene expression, and more recently for making sure that ordered plasmids are what they say on the tin. Mike's group are in the middle of a project investigating how mitochondria control cell function, particularly as it relates to tumour growth. The Hugh Green Cytometry Centre will be collaborating with a national team to try to assemble flow-sorted bird Z chromosomes. And Olivier's group has used nanopore sequencing for investigating metagenomic and 16s amplicon diversity in faecal samples.
Viral Sequencing (2015)
Something that frequently comes up when nanopore sequencing is mentioned are complaints about the error rate. In 2015, I worked with some researchers at ESR: Jing Wang, Nicole Moore and Richard Hall. They were in the MinION Early Access Program, and enlisted my help in adding the final touches to their paper on sequencing an Influenza genome. The different components were PCR-amplified from cDNA transcripts, and it was nice to see that proportional transcript coverage was a lot more consistent across the length of transcripts when compared to a similar extraction followed by Illumina sequencing. Another interesting thing was that we had reasonably high consensus concordance with Illumina sequencing; over 99% for all transcripts, and 100% agreement for two of the eight transcripts. I'd like to point out that this was in a paper published in 2015 using a now-obsolete version of the flow cells, with about 85% per-base average sequencing error.
Phylogenetic Analysis
Jumping forward to results published in a pre-print this year, David Greig gave a talk at the London Calling conference about the use of nanopore sequencing in a clinical outbreak setting. Using a combination of Illumina and Nanopore sequencing, they sequenced faecal samples from two cases: two children who were suffering from something called Haemolytic Uremic Syndrome.
The results of their investigation demonstrated that the two cases were not linked, and were associated with two different outbreaks.
Nanopore sequences did initially appear to have more variation from the reference strains than Illumina sequences, but they were sequencing unamplified DNA, and almost all of these variants were found to be methylated regions. In other words, the electrochemical
structure was affecting the base calling. After masking out methylated variants in the nanopore data, each Nanopore assembly had only 1 location that was different to the previously-assembled reference isolate.
The plasmids had some repetitive sequence, and all the variants identified by Illumina sequencing sat within these repetitive regions. These sequences couldn't be uniquely assigned to a genomic context because the Illumina sequences were too short, probably leading to a consensus error in the final variant calls.
In summary, David Greig found that nanopore sequencing is sufficient for fine-scale phylogenetic classification, and also seems to be more accurate than Illumina sequencing in some situations.
Novel Variant Discovery
Along the same vein, I'd like to highlight some work that was recently published by Melissa Leija-Salazar in the area of variant analysis.
She used the MinION to detect clinically-relevant heterozygous variants in the GBA gene. Defects in this gene have an established clinical link to a lysosomal storage disorder called Gaucher disease. GBA variants have also been found to be a significant risk factor for Parkinson's disease.
Starting with an amplified 9kb genomic region, Melissa's research group was able to detect a 55bp deletion in a recombinant allele from a Parkinson’s patient, which was missed by other sequencing methods. Using MinION sequences, they were able to detect all previously known coding missense mutations at the correct zygosity, and with some help from filtering software, were able to exclude most false variants that appeared to be the result of systematic base calling errors. In total, they ran 95 samples on MinION flow cells, and encountered one instance of a false negative, where a recombinant allele wasn't picked up by nanopore sequencing (possibly as a result of failed PCR primer binding to the recombinant region).
cDNA Mapping
So that's variant calling, but for cDNA mapping, we need single-molecule accuracy that's good enough to find a genomic context for individual sequences (or reads). But here, we can take advantage of the long reads that nanopore sequencing provides.
I'm showing here a few of the nanopore reads that were mapped to the reverse strand of the mitochondrial genome, with one line per read. Anything that matches exactly the reference sequence is show in a dark colour as red, green, blue, or yellow, and any differences are shown in lighter pastel colours. There's a vertical line of bright yellow in the middle here showing a single-nucleotide variant in an unannotated region near the end of the mitochondrial genome.
You can see that there is a bit of noise in the reads, but I think it's reasonable to conclude that all these reads match the mitochondrial reference.
Of course, this is only showing the specificity of nanopore sequencing to the reverse strand of the mitochondrial genome. In other words, how likely it is that reads mapped to the mitochondria actually come from the mitochondrial transcripts. This picture doesn't help much with demonstrating the sensitivity of nanopore sequencing. Perhaps there are other reads that have come off the sequencer that are from mitochondrial genes, but have so much base calling error that they're not mapping here, or are mapping to another
gene.
I don't yet have a complete answer for that, so if you've got an idea about what it might take to convince you that either the sensitivity or specificity of nanopore sequencing is good enough, please let me know.
In order to move ahead with generating and interpreting nanopore sequencing results, I've chosen to make an assumption that is that any sensitivity issues will be distributed randomly throughout the genome. Assuming this, I can concentrate on the proportion of reads mapping to each gene, without being too stuck up on the actual counts.
cDNA Sequencing (4T1 Project)
I want to expand a bit more on the mitochondrial project, because it's something that's taken a reasonable chunk of my time over the last couple of years. One of the main projects I've been working with Carole on has been cDNA sequencing from 4T1 breast cancer cell lines with and without mitochondrial DNA.
I talked in detail about methods behind this in a presentation last year, and we've had another couple of presentations about this project from other group members. For those who want a refresher on my previous presentation, there's a link on this slide:
https://doi.org/10.7490/f1000research.1115283.1
Since then, we've been firming up our numbers by doing additional replicate experiments, and I'm pleased to say that the results confirm the general idea of what we found out: if you knock out mitochondrial DNA, then you also knock out the expression of genes encoded on the mitochondrial genome. This is hopefully an observation that doesn't surprise too many of you.
Gene Expression Table
But for our 4T1 mitochondrial genome project, we didn't just sequence the mitochondrial transcripts; we did a survey of the entire 4T1 transcriptome, and those results have been a bit more surprising. In short, we found a number of transcripts that seem to be under mitochondrial control, or at least their expression is substantially increased or decreased when mitochondria are removed from our 4T1 cell lines. Here's a count table for a few genes, showing some cherry-picked results from that project.
We've now had a bit of time to think about these results, and have come up with a few ideas.
If you think about it a little bit, you might expect that gene transcripts involved in energy production might be altered. After all, if there's no mitochondrially-encoded proteins, it doesn't make much sense to create proteins that combine in the same energy complex. This turns out to be true to some extent, but there are some transcript expressions that don't change, or don't change as much as we expect they should. I haven't done a full survey of nuclear-encoded mitochondrial protein genes, but one such gene we've found is Atp5l (or ATP synthase, mitochondrial F0 complex, subunit G), and we expect we'll find more if we look deeper.
One interesting discovery is that there are at least two cellular differentiation markers that are differentially expressed between our ρ0 lines and our wildtype lines: Cd9, also known as Tetraspanin 29 according to the Jax database; and Cd81, also known as Tapa1, or Tetraspanin 28.
While I'm on the topic of Cd9, I should probably point out that we've got a few more immune-related genes in our differential expression list: Ccl2, Ccl5, Tslp, and Psmb8. We've followed up these results with qPCR, and that supports the observation of a substantial expression difference between wildtype and ρ0 cells.
I've picked out a few other genes here which have also been validated by qPCR: Gstk1, Mgp, Sumo3, and Serpinf1. We're still in the process of sorting out where they all fit in the grand scheme of things.
Transcript Table (mtDNA)
I want to finish up with a couple of slides on something that was a little bit of a puzzle for me, bringing me almost full circle back to the gene counts on the mitochondrial chromosome, which I presented on last year.
Ordinarily, the mitochondrial transcripts are one of the most expressed genes in the entire mouse transcriptome. Those transcripts are almost undetectable in our ρ0 samples, but not completely. If I am to assume that the read mapping process I've used is 100% specific, then I'm led to conclude that there is a sliver of mitochondrial gene expression in our ρ0 cell line. We did actually see this in our first comparison, where the ρ0-A sample was showing a read mapping to the Nd1 gene, but I'd basically ignored it because it was only a couple of reads from one gene out of the whole mitochondria. I figured that maybe there was something like a barcode spillover; a technical error introduced at an early stage of sample preparation, possibly even before we touched the sequencing kits. I would easily dismiss something like this with Illumina sequencing, because the occasional random read mapping is not unexpected.
But this consistency over three separate runs, in low counts within mitochondrial transcripts, has made me wonder again if this is actually a valid mapping.
A Sticky Situation
The thing that I really love about nanopore sequencing is that it allows you to really get your hands dirty and investigate possible reasons why odd things are happening. I talked about this a little bit in relation to Melissa Leija-Salazar and David Greig's variant investigations.
As it turns out, the Nd1 hits I was most worried about are most likely not valid hits to the mitochondrial genome, and this is probably the case with the other odd hits we've seen. To be more specific (or, I guess, more sensitive), these sequences appear to be chimeric reads composed of multiple sequences that slipped through the cracks of my chimeric read filter. Because a part of the read hit the mitochondrial genome, that entire read counted as a hit to the mitochondrial genome.
Here's one of those sequences, which I think fairly cleanly demonstrates the issue. The barcode sequences (in green here) are meant to flank the cDNA sequence, and sit just outside the strand switch primer and the polyA primer sequences. The bits of this sequence that are mapping to the mitochondrial genome are outside those flanking barcode regions, suggesting they're from a random chunk of DNA sequence that decided to hitch along for a ride through the nanopore.
So it looks like I've got a bit more cleaning to do, but I'm now more confident that our results are pretty robust, and we should be able to work through an observational / methods paper about our differential expression results over the next few months.
Acknowledgements
I'd like to thank all the poeple who have contributed to helping me through my exploration of nanopore sequencing, in particular Carole and Mike from the Berridge group, and the Twitter and Nanopore communities. I'd also like to put in a little plug for Darrell's team in helping to keep out lab in tip top shape.
And, of course, thank you for sitting here and being a wonderfully attentive audience. READ LESS
Use of this website is subject to the F1000 Research Limited (F1000) General Terms and Conditions.
Submission of user comments to this website is subject to additional Terms and Conditions. By clicking "I accept the User Comment Terms and Conditions" before you submit your first comment, you agree to be bound by these conditions every time you submit a comment.
Terms relating to user comments