Investigating repetitive sequences in ultra-long nanopore reads
Investigating repetitive sequences in ultra-long nanopore reads
[version 1; not peer reviewed]No competing interests were disclosed
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
All commenters must hold a formal affiliation as per our Policies. The information that you give us will be displayed next to your comment.
User comments must be in English, comprehensible and relevant to the article under discussion. We reserve the right to remove any comments that we consider to be inappropriate, offensive or otherwise in breach of the User Comment Terms and Conditions. Commenters must not use a comment for personal attacks. When criticisms of the article are based on unpublished data, the data should be made available.
Tena koutou i tenei ahiahi. Kia ora.
The image on my title slide has been used frequently by the Malaghan Institute for publicity purposes. It's a panorama photo, and is actually a crop of a full 360° panorama that I made from 16 source images using a program called Hugin.
Panorama Stitching
For those who aren't familiar with photo stitching, in order to assemble the panorama, you first identify control points for matching bits of different images. The ideal points are things that have high contrast and are easy to identify in other images.
Panorama Stitching (Annotated)
There's a bit of an art to getting a good panorama. You need a good eye for the repetitive structures that can cause problems, including the things that look locally unique, but are duplicated in completely different regions of the big picture. If you're lucky and have control of the camera, source images can be taken that make sure no pictures sits entirely within repetitive regions, so that there's always a unique bit in each image that can be matched to neighbouring images.
A Stitch in Time Saves Seven
And if you've got a device that can take snapshots at wider angles, then the stitching also gets a bit easier. In the extreme case, you might not even need to do any stitching at all.
Getting Polar
Once you've got a 360° panorama, you can do a whole bunch of interesting things with it. One thing I love doing is mapping the horizontal panorama to a circular image. It's a bit difficult to take pictures of my feet, so there are bits missing from the centre. In any case, this circular mapping is a reasonably good way at visually representing a large horizontal expanse in a smaller area.
Sequencing Using a Nanopore
But I'm not here to talk about panoramas; I'm here to talk about nanopore sequencing, specifically that carried out by Oxford Nanopore Technologies' MinION, GridION and PromethION devices. For those who don't know about the technology, nanopore sequencing involves the translocation of a long polymer through a hole, detecting changes in its shape as it goes through. These changes are represented as change in electric current over time. By understanding how that shape changes with different DNA bases, it's possible to work backwards from the electrical current trace to the bases, resulting in a very fast, observational method of DNA sequencing.
Nanopore sequencing has more than a few differences from other sequencing technologies; there are two that I want to mention briefly in passing. The first is that the sequencing pores are re-usable: soon after one DNA strand has gone through the pore, it can be re-loaded with another one; one pore will sequence thousands of strands over the
course of a single run at 450 bases per second.
The second is that it can sequence *really long* strands of DNA, and that's what I'll be discussing in my talk today.
I'm a bioinformatician.
I like data.
I like \emph{lots} of data.
But sometimes, a single DNA sequence can be informative.
Repeats - a Ubiquitous Problem / 1
What I want to demonstrate to you today are some sequence visualisation scripts that I've written, but I need something to hang them on.
To get started along this track of long reads, here's a sequence that came off a MinION sequencing device a couple of weeks ago. It's an amplified cDNA transcript, not to be confused with an RNA sequence, of the sort that was sequenced on the International Space Station a few days ago.
This particular transcript encodes one of the ubiquitin genes in Mus musculus, and I've mapped the roughly 1kb linear sequence to a spiral, starting at the centre and going out to the edge of the image.
Ubiquitin is a protein composed of 76 amino acids. For those of you who are quick at maths, you might realise that this transcript is a lot longer than the 228 nucleotides that are needed to encode the protein. In fact, this transcript is actually a polyubiquitin precursor transcript, but you wouldn't know it from looking at this image.
Repeats - a Ubiquitous Problem / 2
However, if I adjust the number of bases per ring in the spiral to a few more, up to 226, then the repetitive structure of the cDNA transcript becomes a lot more obvious. I notice very obvious bands of the same base sequences radiating out from the centre of the image.
Spiralling Out Of The Void
For those interested in the nuts and bolts of this spiral plot, here's my drawing code as it exists at the moment. I use a bit of calculus to work out how far along the spiral a particular location is, so that I can get a smooth transition between different loops while keeping the angle per base constant. I do most of my work on the command line, and
frequently work with sequences from a few tens of bases to a few hundred kilobases, so I've got quite a lot of variables in there to try to make things look at least passable at all scales.
However, this spiral visualisation code doesn't have any way to guess the number of bases per ring. That's all down to the person creating the image. It's a hard problem, and one that can have multiple solutions.
Repetitive Lyrics
So I'm going to take you through a bit of the process of development that led to me discovering that 226 number. It doesn't quite start with a song and a dance, but that'll do for now.
About a year ago a Reddit user, Frigorifico, had created and reported on a lyric visualisation tool called SongSim. You put in the lyrics of a song, and it will visualise the repetitive elements in the words. This tool was actually based on dotplots that are frequently used for DNA alignment.
Each dot in this matrix represents a comparison between two words in the lyrics of a song. Where there is a black dot, it means that the word is the same. In the example highlighted here, we can see that the 34th word in the lyrics is the same as the fifth word in the lyrics, and that word "lamb" appears in a few other places. There's a solid black line down the diagonal because each word is the same as itself.
The interesting thing I noticed about this approach is that it concentrates on the words of a song, rather than letters.
This multi-base approach was also used in a completely non-visual micro and minisatellite-finding program called SATFIND, which looks for repetitive motifs in DNA. Taking these things as inspiration, I worked through my own method to numerically describe the nature of repetitiveness in a DNA sequence.
Discovering Repeats in DNA
I wanted to have a go at doing something similar to that in arbitrary DNA sequences, with lengths from a few hundred bases to a few million bases. I needed a method that was fast, but still able to capture essentially every repetitive pattern in a sequence.
This is the method that I've ended up with, where instead of words in English text, I'm picking out overlapping equal-length subsequences of DNA (geneticists usually call these "kmers"). By recording the location of repeated subsequences, I can generate statistics that relate to the size of repetitive regions. This is how I got that 226 number for the ubiquitin transcript. I just looked for the distances between repeated subsequences, and picked the most common distance as my repeat length, or ring length.
If I'm able to generate statistics, I like to be able to visualise those statistics as well, so to start off, I converted these numbers back into the standard dot-plot format.
Rhyme & Readin'
The mouse cDNA ubiquitin transcript is quite a short sequence, as nanopore reads go. I did mention ultra-long in my title, and I need to shift to genomic DNA for longer sequences. From a brief search on Google Scholar, it doesn't look like researchers have been too interested in doing mouse genomic DNA on the nanopore. However, I was
able to find a fairly close match in the nanopore reads I've got lying around from the rodent parasite, Nippostrongylus brasiliensis. I don't need it to be ubiquitin to demonstrate
the algorithm, but I figured I might as well add in that thread of continuity.
That's a dot-plot representation, but I found that it was quite difficult to use this to explain to other people what was going on. They kept asking difficult questions about it, like, "Why is there a diagonal line down the middle of the image?", "Why is there a mirror image?", and, "Why are the repeating bits a triangle, or a square?"
There was something about the idea of comparing things to themselves that was unintuitive.
Reading From a Profilic Perspective
So I had another think about the data that I was trying to represent. Was there a better way to represent something that made the locations of repetitive features more obvious. What I came up with was this: the location on a sequence is represented on the X axis, and the distance between features is represented on the Y axis in a log scale. No box shapes to be seen, and it helps to hide the idea that there's a central spine of a reference sequence that represents the ideal path through a sequence. Reverse complement pairs appear as funnel-shaped things, and repetitive blocks appear as sliced hills, or maybe a ripple pattern of a sunset on water.
The read shown here is a moderate-length read, within the normal range of what would appear in a genomic DNA sequencing run on a nanopore MinION. However, this read is only a small part of a bigger picture. The read was actually used as evidence to assemble a 400kb contig from about 500 source sequences using a genome assembly program called Canu.
Overlap Consensus Assembly
For those who aren't familiar with the overlap-consensus method of genome assembly, in order to assemble the genome, Canu first identifies seeds for matching short subsequences in different sequenced reads. The ideal seeds are ones that have high complexity and are easy to identify in other sequences. There's a bit of an art to getting a good assembly. You need a good eye for the repetitive structures that can cause problems, including the things that look locally unique, but are duplicated in completely different regions of the genome. If you're lucky and have control of the sample preparation, long reads can be sequenced to make sure no reads sit entirely within repetitive regions, so that there's always a unique bit in each read that can be matched to neighbouring reads.
People have frequently compared genome assembly to jigsaw puzzles, and photo stitching is basically a jigsaw puzzle where you get to decide what the pieces look like. Even though genome assembly is one dimensional rather than two dimensional, it still has similar issues. Sometimes it's hard to know which way round a sequence goes, like with ropes and poles in jigsaw puzzles. Sometimes you get bits which are complex, but highly repetitive, like windows and bricks. And sometimes there are low-complexity regions, where the same thing is repeated in close proximity with itself, just like the sky and grass.
But I will admit that this 400kb sequence is an assembled contig, not a real ultra-long read. It's a good demonstration of why the genome of Nippostrongylus brasiliensis is so hard to assemble with short Illumina reads, but I haven't yet done any ultra-long read sequencing on Nippo.
Nanopore WGS Consortium
For that, we need to move on to the less repetitive human genome, and the Nanopore whole genome sequencing consortium. I haven't participated directly in their projects, but I'm a huge fan of the data they have produced, which is released under a creative commons attribution license. They've got one paper shunted through the peer review process, and it looks like there'll be a few more to follow (both DNA and RNA) in the next year or so.
Human UBC
So there we go. In the whole genome consortium reads, I found one read over 100kb that spanned over the human UBC gene, about 20kb from the end of the read. The current consensus of the nanopore sequencing community is that an ultra-long read sequencing run has a read N50 of over 100 kilobases; in other words, over 50\% of the sequenced bases in the run come from reads that are over 100kb in length.
According to the consortium's paper, the longest full-length mapped read in the data set (aligned with GraphMap) was 882 kb, corresponding to a reference span of 993 kb.
Human VDR et al.
That's as far as I've got with finding out where ubiquitin is, but I just want to leave you with one last thing, and that's the longest read that has been sequenced so far on Oxford Nanopore's MinION. This here is a read from chromosome 12 that's almost 2.3 megabases, produced by Alex Payne and Nadine Holmes at the University of Nottingham. The template would have been about 2/3 of a millimetre in length, and it would have taken about an hour and a half to move through the nanopore at 450 bases per second. As a comparison, as far as I know the fastest sequencing speed for PacBio is about 2.5 bases per second, so this sequence would have taken about 10 days to sequence on a PacBio machine, if that were even possible.
However, this sequence couldn't actually be sequenced in one go by Oxford Nanopore's standard sequencing algorithm. They put in an upper limit in the software of a million electrical samples per read, and this sequence exceeds that about five times over. Matt Loose's team did some post-processing on consecutive raw signal traces in order to
create the complete sequence.
Here I've spread the profile plot into a semicircle so that the long range differences have a bit more leg room. The sequence shown here apparently includes the vitamin D receptor, so for those who are researching vitamin D, there's now a single nanopore read available that has a couple of megabases of surrounding context.
If you want to explore this more, feel free to download this code, run it on your own sequences, and give me some feedback. I'm always interested in other improvements that people might have, or things that this could be used for.
Concluding Remarks
I'm a bit of a research butterfly. This is one of many side projects that I've been tinkering away with, and is still at the discovery phase. I have a few theories about why these patterns are present, but at the moment have other priorities that are keeping me away from proper scientific investigations of that. Feel free to chat to me afterwards if you're interested in a collaboration, or contributing towards further research along these lines.
[see https://twitter.com/DamonLisch/status/1028321321279741952]
However, I hope that this won't dissuade others from being separately interested in weird stuff regarding DNA. The first step of scientific discovery is the observation of something that is unexplained, so lets keep up with trying to notice the things that are staring us in the face.
Thanks for listening, and for being a great audience.
Human HLA et al.
[Only if questions get asked about the abstract...]
Just in case you're interested in a more extreme version of this plot, here's a contig assembled by the nanopore WGS consortium that contains the entirety of the MHC region... plus another 12Mb of surrounding sequence. The MHC region is shown at about 2.5 megabases to 6.5 megabases on this plot (give or take half a megabase or so). For those who have read my abstract, this is the source of the two similar 25bp sequences I mentioned, one appearing 41 times (ACTCCAGCCTGGTGACAGAGTGAGA) and one appearing just once (ACTCCAGCTCACAGTCCTGTCGATG) within this region. This long sequence took quite a while to render, so if you've got less than 10 minutes before your talk and realised you forgot to include the sequence that was mentioned in your abstract, it just might be better just to apologise for that.
Tena koutou i tenei ahiahi. Kia ora.
The image on my title slide has been used frequently by the Malaghan Institute for publicity purposes. It's a panorama photo, and is actually a crop of a full 360° panorama... READ MORE
Tena koutou i tenei ahiahi. Kia ora.
The image on my title slide has been used frequently by the Malaghan Institute for publicity purposes. It's a panorama photo, and is actually a crop of a full 360° panorama that I made from 16 source images using a program called Hugin.
Panorama Stitching
For those who aren't familiar with photo stitching, in order to assemble the panorama, you first identify control points for matching bits of different images. The ideal points are things that have high contrast and are easy to identify in other images.
Panorama Stitching (Annotated)
There's a bit of an art to getting a good panorama. You need a good eye for the repetitive structures that can cause problems, including the things that look locally unique, but are duplicated in completely different regions of the big picture. If you're lucky and have control of the camera, source images can be taken that make sure no pictures sits entirely within repetitive regions, so that there's always a unique bit in each image that can be matched to neighbouring images.
A Stitch in Time Saves Seven
And if you've got a device that can take snapshots at wider angles, then the stitching also gets a bit easier. In the extreme case, you might not even need to do any stitching at all.
Getting Polar
Once you've got a 360° panorama, you can do a whole bunch of interesting things with it. One thing I love doing is mapping the horizontal panorama to a circular image. It's a bit difficult to take pictures of my feet, so there are bits missing from the centre. In any case, this circular mapping is a reasonably good way at visually representing a large horizontal expanse in a smaller area.
Sequencing Using a Nanopore
But I'm not here to talk about panoramas; I'm here to talk about nanopore sequencing, specifically that carried out by Oxford Nanopore Technologies' MinION, GridION and PromethION devices. For those who don't know about the technology, nanopore sequencing involves the translocation of a long polymer through a hole, detecting changes in its shape as it goes through. These changes are represented as change in electric current over time. By understanding how that shape changes with different DNA bases, it's possible to work backwards from the electrical current trace to the bases, resulting in a very fast, observational method of DNA sequencing.
Nanopore sequencing has more than a few differences from other sequencing technologies; there are two that I want to mention briefly in passing. The first is that the sequencing pores are re-usable: soon after one DNA strand has gone through the pore, it can be re-loaded with another one; one pore will sequence thousands of strands over the
course of a single run at 450 bases per second.
The second is that it can sequence *really long* strands of DNA, and that's what I'll be discussing in my talk today.
I'm a bioinformatician.
I like data.
I like \emph{lots} of data.
But sometimes, a single DNA sequence can be informative.
Repeats - a Ubiquitous Problem / 1
What I want to demonstrate to you today are some sequence visualisation scripts that I've written, but I need something to hang them on.
To get started along this track of long reads, here's a sequence that came off a MinION sequencing device a couple of weeks ago. It's an amplified cDNA transcript, not to be confused with an RNA sequence, of the sort that was sequenced on the International Space Station a few days ago.
This particular transcript encodes one of the ubiquitin genes in Mus musculus, and I've mapped the roughly 1kb linear sequence to a spiral, starting at the centre and going out to the edge of the image.
Ubiquitin is a protein composed of 76 amino acids. For those of you who are quick at maths, you might realise that this transcript is a lot longer than the 228 nucleotides that are needed to encode the protein. In fact, this transcript is actually a polyubiquitin precursor transcript, but you wouldn't know it from looking at this image.
Repeats - a Ubiquitous Problem / 2
However, if I adjust the number of bases per ring in the spiral to a few more, up to 226, then the repetitive structure of the cDNA transcript becomes a lot more obvious. I notice very obvious bands of the same base sequences radiating out from the centre of the image.
Spiralling Out Of The Void
For those interested in the nuts and bolts of this spiral plot, here's my drawing code as it exists at the moment. I use a bit of calculus to work out how far along the spiral a particular location is, so that I can get a smooth transition between different loops while keeping the angle per base constant. I do most of my work on the command line, and
frequently work with sequences from a few tens of bases to a few hundred kilobases, so I've got quite a lot of variables in there to try to make things look at least passable at all scales.
However, this spiral visualisation code doesn't have any way to guess the number of bases per ring. That's all down to the person creating the image. It's a hard problem, and one that can have multiple solutions.
Repetitive Lyrics
So I'm going to take you through a bit of the process of development that led to me discovering that 226 number. It doesn't quite start with a song and a dance, but that'll do for now.
About a year ago a Reddit user, Frigorifico, had created and reported on a lyric visualisation tool called SongSim. You put in the lyrics of a song, and it will visualise the repetitive elements in the words. This tool was actually based on dotplots that are frequently used for DNA alignment.
Each dot in this matrix represents a comparison between two words in the lyrics of a song. Where there is a black dot, it means that the word is the same. In the example highlighted here, we can see that the 34th word in the lyrics is the same as the fifth word in the lyrics, and that word "lamb" appears in a few other places. There's a solid black line down the diagonal because each word is the same as itself.
The interesting thing I noticed about this approach is that it concentrates on the words of a song, rather than letters.
This multi-base approach was also used in a completely non-visual micro and minisatellite-finding program called SATFIND, which looks for repetitive motifs in DNA. Taking these things as inspiration, I worked through my own method to numerically describe the nature of repetitiveness in a DNA sequence.
Discovering Repeats in DNA
I wanted to have a go at doing something similar to that in arbitrary DNA sequences, with lengths from a few hundred bases to a few million bases. I needed a method that was fast, but still able to capture essentially every repetitive pattern in a sequence.
This is the method that I've ended up with, where instead of words in English text, I'm picking out overlapping equal-length subsequences of DNA (geneticists usually call these "kmers"). By recording the location of repeated subsequences, I can generate statistics that relate to the size of repetitive regions. This is how I got that 226 number for the ubiquitin transcript. I just looked for the distances between repeated subsequences, and picked the most common distance as my repeat length, or ring length.
If I'm able to generate statistics, I like to be able to visualise those statistics as well, so to start off, I converted these numbers back into the standard dot-plot format.
Rhyme & Readin'
The mouse cDNA ubiquitin transcript is quite a short sequence, as nanopore reads go. I did mention ultra-long in my title, and I need to shift to genomic DNA for longer sequences. From a brief search on Google Scholar, it doesn't look like researchers have been too interested in doing mouse genomic DNA on the nanopore. However, I was
able to find a fairly close match in the nanopore reads I've got lying around from the rodent parasite, Nippostrongylus brasiliensis. I don't need it to be ubiquitin to demonstrate
the algorithm, but I figured I might as well add in that thread of continuity.
That's a dot-plot representation, but I found that it was quite difficult to use this to explain to other people what was going on. They kept asking difficult questions about it, like, "Why is there a diagonal line down the middle of the image?", "Why is there a mirror image?", and, "Why are the repeating bits a triangle, or a square?"
There was something about the idea of comparing things to themselves that was unintuitive.
Reading From a Profilic Perspective
So I had another think about the data that I was trying to represent. Was there a better way to represent something that made the locations of repetitive features more obvious. What I came up with was this: the location on a sequence is represented on the X axis, and the distance between features is represented on the Y axis in a log scale. No box shapes to be seen, and it helps to hide the idea that there's a central spine of a reference sequence that represents the ideal path through a sequence. Reverse complement pairs appear as funnel-shaped things, and repetitive blocks appear as sliced hills, or maybe a ripple pattern of a sunset on water.
The read shown here is a moderate-length read, within the normal range of what would appear in a genomic DNA sequencing run on a nanopore MinION. However, this read is only a small part of a bigger picture. The read was actually used as evidence to assemble a 400kb contig from about 500 source sequences using a genome assembly program called Canu.
Overlap Consensus Assembly
For those who aren't familiar with the overlap-consensus method of genome assembly, in order to assemble the genome, Canu first identifies seeds for matching short subsequences in different sequenced reads. The ideal seeds are ones that have high complexity and are easy to identify in other sequences. There's a bit of an art to getting a good assembly. You need a good eye for the repetitive structures that can cause problems, including the things that look locally unique, but are duplicated in completely different regions of the genome. If you're lucky and have control of the sample preparation, long reads can be sequenced to make sure no reads sit entirely within repetitive regions, so that there's always a unique bit in each read that can be matched to neighbouring reads.
People have frequently compared genome assembly to jigsaw puzzles, and photo stitching is basically a jigsaw puzzle where you get to decide what the pieces look like. Even though genome assembly is one dimensional rather than two dimensional, it still has similar issues. Sometimes it's hard to know which way round a sequence goes, like with ropes and poles in jigsaw puzzles. Sometimes you get bits which are complex, but highly repetitive, like windows and bricks. And sometimes there are low-complexity regions, where the same thing is repeated in close proximity with itself, just like the sky and grass.
But I will admit that this 400kb sequence is an assembled contig, not a real ultra-long read. It's a good demonstration of why the genome of Nippostrongylus brasiliensis is so hard to assemble with short Illumina reads, but I haven't yet done any ultra-long read sequencing on Nippo.
Nanopore WGS Consortium
For that, we need to move on to the less repetitive human genome, and the Nanopore whole genome sequencing consortium. I haven't participated directly in their projects, but I'm a huge fan of the data they have produced, which is released under a creative commons attribution license. They've got one paper shunted through the peer review process, and it looks like there'll be a few more to follow (both DNA and RNA) in the next year or so.
Human UBC
So there we go. In the whole genome consortium reads, I found one read over 100kb that spanned over the human UBC gene, about 20kb from the end of the read. The current consensus of the nanopore sequencing community is that an ultra-long read sequencing run has a read N50 of over 100 kilobases; in other words, over 50\% of the sequenced bases in the run come from reads that are over 100kb in length.
According to the consortium's paper, the longest full-length mapped read in the data set (aligned with GraphMap) was 882 kb, corresponding to a reference span of 993 kb.
Human VDR et al.
That's as far as I've got with finding out where ubiquitin is, but I just want to leave you with one last thing, and that's the longest read that has been sequenced so far on Oxford Nanopore's MinION. This here is a read from chromosome 12 that's almost 2.3 megabases, produced by Alex Payne and Nadine Holmes at the University of Nottingham. The template would have been about 2/3 of a millimetre in length, and it would have taken about an hour and a half to move through the nanopore at 450 bases per second. As a comparison, as far as I know the fastest sequencing speed for PacBio is about 2.5 bases per second, so this sequence would have taken about 10 days to sequence on a PacBio machine, if that were even possible.
However, this sequence couldn't actually be sequenced in one go by Oxford Nanopore's standard sequencing algorithm. They put in an upper limit in the software of a million electrical samples per read, and this sequence exceeds that about five times over. Matt Loose's team did some post-processing on consecutive raw signal traces in order to
create the complete sequence.
Here I've spread the profile plot into a semicircle so that the long range differences have a bit more leg room. The sequence shown here apparently includes the vitamin D receptor, so for those who are researching vitamin D, there's now a single nanopore read available that has a couple of megabases of surrounding context.
If you want to explore this more, feel free to download this code, run it on your own sequences, and give me some feedback. I'm always interested in other improvements that people might have, or things that this could be used for.
Concluding Remarks
I'm a bit of a research butterfly. This is one of many side projects that I've been tinkering away with, and is still at the discovery phase. I have a few theories about why these patterns are present, but at the moment have other priorities that are keeping me away from proper scientific investigations of that. Feel free to chat to me afterwards if you're interested in a collaboration, or contributing towards further research along these lines.
[see https://twitter.com/DamonLisch/status/1028321321279741952]
However, I hope that this won't dissuade others from being separately interested in weird stuff regarding DNA. The first step of scientific discovery is the observation of something that is unexplained, so lets keep up with trying to notice the things that are staring us in the face.
Thanks for listening, and for being a great audience.
Human HLA et al.
[Only if questions get asked about the abstract...]
Just in case you're interested in a more extreme version of this plot, here's a contig assembled by the nanopore WGS consortium that contains the entirety of the MHC region... plus another 12Mb of surrounding sequence. The MHC region is shown at about 2.5 megabases to 6.5 megabases on this plot (give or take half a megabase or so). For those who have read my abstract, this is the source of the two similar 25bp sequences I mentioned, one appearing 41 times (ACTCCAGCCTGGTGACAGAGTGAGA) and one appearing just once (ACTCCAGCTCACAGTCCTGTCGATG) within this region. This long sequence took quite a while to render, so if you've got less than 10 minutes before your talk and realised you forgot to include the sequence that was mentioned in your abstract, it just might be better just to apologise for that. READ LESS
Use of this website is subject to the F1000 Research Limited (F1000) General Terms and Conditions.
Submission of user comments to this website is subject to additional Terms and Conditions. By clicking "I accept the User Comment Terms and Conditions" before you submit your first comment, you agree to be bound by these conditions every time you submit a comment.
Terms relating to user comments