London Calling 2019, Day 2: cancer research, completing human chromosomes, epigenetic signatures of viruses, more
Thu 23rd May 2019
The second day of talks at London Calling saw some phenomenal science - from completing the human genome to using cell-free DNA to investigate residual disease in cancer.
Get the run-down of all the plenary talks here, and check out the Nanopore Community live blog for live updates throughout Day 3, or follow us @nanoporeconf or #nanoporeconf to see what's going on.
Plenary: Karen Miga - Telomere-to-telomere assembly of a complete human X chromosome
Watch Karen's talk
Karen opened her plenary talk by stating that we are "entering into a new era" in genetics and genomics which is demanding complete, high-quality assemblies. The current human reference genome (GRCh38) is the most accurate and complete vertebrate genome to date. However, it is incomplete - there are still 368 unresolved issues and 102 gaps. Karen said that it "really drives it home when we look at chromosome 21", which has ~30 Mb of assembled sequence but ~20 Mb of missing sequence - unexplored regions to study that could be linked to disease. These problem regions are associated with segmental duplications, gene families, satellite arrays, centromeres, and rDNAs, as well as uncharacterised sequence variation in the human population. The major challenge is the generation of complete assemblies across repetitive regions that can span up to hundreds of kilobases, or even megabases at centromeres. Karen asked: can high-coverage, ultra-long read sequencing be used to resolve these regions and complete assemblies of the human genome? She stated that this question was what motivated the establishment of the Telomere-to-Telomere (T2T) consortium, of which she is a member, which is an open, community-based effort to generate the first complete assembly of a human genome. The aim of this consortium is to "shift the standards in genomics" to the highest quality.
Karen and her colleagues have sequenced CHM13hTERT, a karyotypically stable haploid cell line, using long-read nanopore sequencing. From the start of May 2018 to January 2019, 94 MinION/GridION flow cells were used for CHM13 sequencing, obtaining 50X depth of coverage from ultra-long nanopore reads. The maximum mapped read length was 1.04 Mb. These nanopore ultra-long read data were used for contig building, along with long read datasets from other sequencing platforms for polishing and structural validation. The alignment tool Canu was used for sequence assembly; the final assembly was 2.94 Gbp with an NG50 contig size of 75 Mbp - this exceeds the continuity of GRCh38 in completeness which has an NG50 contig size of 56 Mbp. Moreover, a subset of chromosome assemblies only remained broken at the centromere.
Karen stated that the next step was to use this hybrid de novo assembly to assemble a complete human X chromosome. The X chromosome seemed a "natural place to invest time", for it is associated with many Mendelian diseases. The biggest challenge in assembly of this chromosome was at the centromere, which required ultra-long nanopore reads spanning 100 kbp repeat-rich regions. However, she stated that an assembly is only a hypothesis and the manually-finished assembly needed to be validated using other methods such as digital droplet PCR, restriction enzyme pulse-field gels, and structural validation techniques.
Karen demonstrated how difficult it is to assemble centromeric regions, especially the centromere of the X chromosome where, for example, only 37 structural variants are present to guide assembly, and the majority of these SVs are very small. She stated that the next challenge is determining how to polish the assembly and bring it to high accuracy. How can we create new strategies to deal with tandem repeats? Karen described how they created a polishing strategy using unique k-mers; this firstly involves identifying all unique, single-copy k-mers throughout the genome. These k-mers are used to create a scaffold for anchoring high-confident, long-read alignments; only those long-reads aligning with unique k-mers are retained. Karen described how spacing of single-copy k-mers can be irregular in repeat dense regions, such as centromeres. For example, the longest distance observed between two k-mers on the X chromosome was 53 kbp, this means that reads of ≥53 kbp are required to span this section of the chromosome.
Two rounds of nanopolish were used for k-mer-based polishing of nanopore reads, along with long read polishing from other sequencing platforms, and HiFi alignments were then used to evaluate the success of polishing. Karen concluded this section by stating that the finished T2T X chromosome had a structurally validated assembly, from telomere-to-telomere, including a problem 2.8 Mb tandem repeat at the X centromere. The novel k-mer based polishing strategy they used improved the assembly quality of large repeat-rich regions. She stated that this demonstration is "really bringing the point home that we are achieving high quality and high continuity".
In the final section of her talk, Karen asked "how do we start to finish the human genome?" Focusing on chromosomes 7 and 9, at D6Z1 and D8Z2 centromeric sites from satellite array predicted regions, Karen explained how we can see the difference in sequence diversity compared to the X chromosome centromere with its 2.8 Mb tandem repeat. At the centromeres on these autosomal chromosomes there is far greater sequence diversity which makes their assembly significantly easier - there is "a lot more information to guide mapping, polishing and assembly". For example, the maximum spacing between k-mers is only 3 kb. Using the k-mer polishing approach greatly improved the assembly.
Karen concluded by stating that the goal of the next two years is to obtain a complete human genome. Challenges facing us include acrocentric regions, large segmental duplications, and classical human satellites, and we need to start thinking about automating repeat assembly. "We keep setting the bar higher and higher" for the genetics community in terms of assembly quality and completeness. Thinking about 2020 and beyond, we need to start thinking about human populations, as opposed to a single human genome. This will require increasingly high-throughput long-read sequencing on the PromethION, and they are now starting to "ramp up the process". It will also require cloud-based assembly and processing; Karen announced that the SHASTA cloud-based assembler is imminently being released by Santa Cruz; this has achieved assembly of 2.8 Gbp of sequence data in only 5.6 hours.
"So I guess that my take home message is...keep calm because everything is awesome"!
Please note that all the CHM13 data is openly available at github.com/nanopore-wgs-consortium/chm13.
Plenary: Christopher Oakes - Discerning the origin of Epstein-Barr virus in patients using nanopore-derived DNA methylation signatures
Watch Christopher's talk
Christopher Oakes (The Ohio State University) began his plenary talk by highlighting the potential of tumour viruses to improve patient care. He then introduced Epstein-Barr Virus (EBV), the "prototypical tumour virus". A very common virus, EBV infects ~90-95% of humans: after entering the body and infecting cells, cells proliferate and the virus replicates. The human immune system is typically very effective in launching a response: NK cells limit this proliferation, and T cells learn to recognise the viral antigens and kill infected cells. In response, EBV can get around these defences by moving into a latent state to hide from the immune system. In rare cases, the virus is able to survive and proliferate through a number of complex processes, leading to the growth of tumours. Christopher noted that one of the most significant difficulties in treating cancers is that tumours are "self": how can killing of normal cells be avoided when targeting tumours? This is where EBV's properties as a tumour virus can be exploited: EBV is "not self" and provides a potential target.
Christopher explained that there are three ways in which EBV can be useful in detecting and treating cancer. Firstly, the presence of elevated levels of EBV in the blood is used to screen for cancer, enabling its identification in early, asymptomatic stages. Christopher displayed the results of a study by Lo et al. (New England Journal of Medicine, 2017) into the use of EBV DNA detection from plasma for nasopharyngeal cancer screening. Initial identification of elevated levels of EBV lead to the identification of 34 undiagnosed cases of nasopharyngeal cancer, with one undiagnosed after not proceeding to further screening. Whilst this demonstrated its efficacy, Christopher also noted the lack of specificity of the test, given the initial ~1,000 people displaying elevated EBV levels. Secondly, EBV can be used to diagnose and classify the disease present; Christopher listed cancers that his team investigate - several forms of lymphoma, and gastric adenocarcinoma. For example, for extranodal NK/T cell lymphoma (ENKTL), EBV titre is a diagnostic. Lastly, Christopher describes how targeting the virus specifically can provide a "trojan horse way" of entering and killing tumour cells. This could be achieved via antiviral therapy or by training T/NK cells to attack EBV-infected tumour cells. However, it is here important to determine the state of the virus when it is detected in the blood; Christopher explains that in the clinic, a PCR test from plasma DNA is used to detect EBV levels, but elevated levels may represent an active infection of mononucleosis, or may be a result of immunosuppresion, or may be due to tumour cells. In order to determine the correct course of treatment, it is important to determine which of these that the test represents. How could this be achieved?
Christopher highlighted here that there is "more information in DNA than just the sequence itself." Methylation of EBV DNA, he explained, is intrinsic to the life cycle of the virus. When the virus first infects a host, its DNA is in an unmethylated state; when it passes into the latent phase, it becomes highly methylated, silencing most genes. When the virus is reactivated and passes into its lytic phase, this methylation is again lost. This differential methylation could hold the key to identifying the stage of infection represented by elevated EBV levels. Christopher displayed the results of an investigation of methylation in ~30 regions of the EBV genome in cell lines, using sodium bisulphite conversion and PCR. This showed that methylation was high in tumour cell lines, but low in in-vitro infected samples. Differential methylation was also observed across different regions of the genome. Latent virus can be activated, inducing lytic cells; analysis indicated that methylation decreased when lytic activation was induced. However, Christopher noted that diagnosis using this method is challenging, and cannot determine whether methylation levels represent a heterogeneous population or separate populations with either high or low levels of methylation.
Christopher then introduced his team's work with nanopore sequencing in detecting methylation in EBV. Firstly, samples with known proportions of methylated and unmethylated EBV DNA were produced. Tumour virus DNA was amplified via WGA, removing all methylation. Some of this methylation-free DNA was reserved; the rest was methylated using SssI (CpG) methylase. The resulting samples were mixed in known proportions of methylated and unmethylated sample, for use in producing a standard curve for calculating the percentage of methylation present. Samples were prepared for sequencing with the Ligation Sequencing Kit (SQK-LSK109), sequenced on the MinION device and the data basecalled using guppy. Reads were mapped to the EBV genome with minimap2. Reads were also re-basecalled with the high-accuracy flip-flop basecaller. Nanopolish was used to index and call methylated CpG sites: Christopher noted the robust, qualitative distinction between methylated and non-methylated DNA, and good concordance between the expected and preserved proportions.
Tumour cell EBV DNA was then sequenced on the MinION to assess and compare methylation levels, revealing different levels of methylation from different samples; it was possible to detect different populations in some samples. Detection was also possible from primary cells - for example, in a sample representing 100% tumour cells in the latent phase. Christopher demonstrated two examples in which the proportion of methylation clearly dropped after induction of the lytic phase.
Christopher then asked: can we detect methylation levels of EBV from cell-free (plasma) DNA, and if so, is cell-free DNA methylation representative of tumour DNA methylation? The answer is: yes - it is detectable, and good correlation was seen between methylation in cell-free DNA and tumour DNA. Christopher showed an example in which much higher methylation was detected in a cancer patient than that seen in a mononucleosis patient. In another example, a sample from a patient prior to treatment showed high methylation; analysis of another sample from during cancer treatment displayed a new, low-methylation population indicative of a change in the life cycle of the virus. Christopher pointed out that mapping data demonstrates that tumours feature very complex methylation patterns. He then showed a comparison of gene expression between an ENKTL sample and a Burkin's Lymphome (BL) sample, investigating the differential expression of three genes (EBNA1, EBNA2 and BART) between the two: methylation analysis across these genes correlated well with this expression. This suggests that it could be possible to use methylation patterns to determine gene expression, without even having to look at the expression itself. In future, the group plan to further investigate the combination of genetic and genomic information, and also intend to investigate how to enrich for EBV DNA, as levels are often still below the threshold of detection.
Panel plenary: Aquatic ecosystems
Shaili Johri - Unraveling shark secrets: sequencing genomes and microbiomes for conservation
Watch Shaili's talk
Shaili opened her talk by boldly asking: why save sharks and rays? We are typically fearful of them (cue an image of Jaws!). Yet according to Shaili, sharks can be "adorable" and "they are gentle giants... generally".
Jokes aside, Shaili stated that sharks are crucial for the functioning of healthy oceanic ecosystems; they keep mesopredators in check, which is important for maintaining the oceanic food chain at a healthy equilibrium. They are also major contributors to the economy. Sharks have "been around for a really long time, so they are very evolutionarily resilient"; in fact, "everything about sharks is really long!" For example, they have a long life history, a long development time... and therefore require extraordinary genome stability to protect them from a potentially large accumulation of mutations over a long lifetime.
Despite their importance, up to 50% of shark species are threatened by extinction, due to the international demand for shark fins and other shark parts or derivatives. We are "fishing them out", with ~300 million sharks killed by humans per year. To put this into perspective, there have only been 20 shark attacks on humans in last decade - they should be far more scared of us than we are of them.
Conservation of endangered populations and an understanding of their evolutionary adaptations are both difficult when genomic information is lacking for over 50% of shark species. Globally, the areas where sharks are most endangered also correlates with the areas of highest data deficiency; i.e. we don't know enough about the sharks in these locations. Shaili suggested that this makes it easier to fish for them and exploit them.
The goal of Shaili and her team has been to reduce the data deficiency of shark populations by performing on-site genomic investigations in shark biodiversity hotspots. We need to figure out the population sizes of the species present, including the presence of interbreeding subpopulations; we need to determine the distribution of species and what is threatening them. A lot of species identification is typically based on Cytochrome oxidase I gene amplicon sequencing, which doesn't differentiate between species, or sometimes the gene simply does not amplify. This information therefore does not help advise us in determining the best strategy for species conservation and management. Shaili and her team wanted to come up with a method that could be used by anyone, anywhere to help inform best practices for shark species conservation and management.
Shaili took her research to India - the second largest global shark supplier. Shark meat is often seen in fish markets in India, typically from a range of species; as the shark fins are often separated from the body, the species can't be identified, and so whether the meat is being sold legally cannot be determined. In her research, samples were obtained from shark fin specimens; genomic DNA was then extracted and sequenced using the Oxford Nanopore MinION platform. Shaili applied a "genome skimming" shallow whole-genome sequencing approach using a single MinION flow cell per sample. High copy number mitochondrial sequences were taken and used for taxonomic identification and phylogenetic analysis, and high copy number nuclear sequences were used to study population size and structure. With in-field genome skimming, Shaili has achieved 99.8% sequence accuracy, coverage of a sixth of the shark genome, sequence lengths of up to 100 kbp, and very high GC percentage coverage. One problem that she faced was that contigs aligning to the mitogenome appeared to be longer than the mitochondrial genome. She discussed that these reads were probably coming from a replicating mitochondria whose DNA hadn't been cut yet in the cell, "so that was an easy fix".
With nanopore sequencing, fast and accurate species identification was performed, most confidently within 3 hours, but they "could start determining species ID within 2 minutes". She compared this timeline to Sanger sequencing which would typically take about 24 hours to achieve a similar result. By integrating sequence data acquired in the field with local databases through the Geneious software, which has a "super easy GUI interface" that does not require any bioinformatics, Shaili's data has also contributed to an increased capacity for wildlife forensics. For example, the silky shark species was identified in the fish market - trade of this species is illegal as it is protected. Confirmation of her in-field results was performed back in the lab using Canu alignment. With the presence of higher computing power compared to out in the field, more sequence contigs were also obtained and other genes were covered, such as homeobox genes, and genes involved in immune system function and genome stability. This was particularly exciting, as these genes had not been identified before.
Another interesting observation was that the GC percentage of the shark genomes ranged from 29.5-60%. This linked to how important sequencing versatility is in wildlife forensics, as the genomes of endangered species have a range of GC content; for example, elephant genomes have ~39% GC content, sharks and rays ~42% GC content, and rhinos ~51% GC content. Therefore, the methods that Shaili has used for wildlife forensics could be applied to the conservation of a range of species because they have obtained such good efficiency across a range of GC% in sharks.
Next Shaili briefly described how her team has identified that a lot of fish and chip shops in the UK often sell shark meat, although this is generally unknown to the shop owners. The method could therefore also be used to detect shark meat in fish and chip shop food.
Lastly, Shaili wanted to focus on the monitoring of shark populations. She stated that it is important to monitor microbial microfauna, as this is important for shark health and environmental monitoring. Therefore, in addition to her investigations into shark population genomics and conservation, Shaili has also been investigating the microbiomes of free-swimming whale sharks (Rhincodon typus) across the globe, in order to identify threats such as disease, pollution and habitat degradation. In practical terms: "how do we sample sharks"? Most often, free-swimming sharks are sampled underwater; this is performed using a device that seals on to the shark skin, which flushes microbes off the shark skin to isolate them. She declared that the main issue with this process is keeping up with a swimming shark! They have not yet tried swimming next to thresher sharks; those sharks which they cannot swim alongside are brought on board for sampling.
This project has also contributed to the research training of undergraduates in the lab. Students have been directly involved in sampling, DNA extraction, and nanopore sequencing and analysis, to investigate the taxonomy and function of the microbiomes detected. One of the key findings that has been made is that the genomes are rich in heavy metal metabolising genes; they are currently investigating why this might be.
Shaili concluded that her team will continue to work with species identification and the sequencing of shark genomes, as well as wildlife forensics, and lastly, she will continue to study population genetics using genome skimming data. This research has demonstrated how the portable sequencing technology of Oxford Nanopore has improved the genetic understanding of shark populations, which will ultimately facilitate the protection of endangered shark species, and potentially other endangered wildlife species.
Cheryl Ames - Field-forward sequencing with Oxford Nanopore technology: a strategy to establish the upside-down mangrove jellyfish Cassiopea xamachana as a bioindicator
Watch Cheryl's talk
Cheryl Ames (US Naval Research Laboratory) kicked off the aquatic ecosystems panel plenary with a reminder of the importance of the ocean, "the source of every breath on Earth." Oceans cover 72% of the Earth's surface and are a home to most of the Earth's biodiversity; however, this is under threat from climate change, activities such as over-fishing, and natural disasters. Cheryl stressed the need to study the ocean's biodiversity "before it's too late." She presented the work of her team using eDNA to categorise this diversity, for environmental monitoring, sting prevention (affecting both combat divers in the Navy and recreational divers), to investigate biodiversity in the gulfstream, and also aid public aquariums.
Cheryl described how she and her team decided to focus on jellyfish eDNA, which has multiple sources: from spawn, gametes and stinging mucus. She showed the upside-down jellyfish (Cassiopea xamachana) in mangrove forests, releasing spawn and mucus, and the spawning of the venomous box jellyfish (Alatina galata) in coastal waters; both events provide good sources of eDNA and enable the detection of these organisms even if the visible jellyfish are not seen or are no longer present in the area. The team selected these two jellyfish for further study, as they are emerging model organisms; they are about to publish the genomes for both, together with a third genome, adding to the very few jellyfish assemblies currently available for use as references genomes. Cheryl also noted that only one other paper currently discusses the use of jellyfish eDNA. Showing the different stages of the life cycle of jellyfish, from microscopic to macroscopic, Cheryl further demonstrated how much more there was to jellyfish than is visible to the naked eye.
Cheryl then highlighted how she and her team work on developing tools to ensure the safety of service members, whilst being "conscious and conscientious of the environment", in the locations they visit around the globe. Quoting its versatility and portability, she described how "the MinION was really the answer to what we wanted to do." She showed that the team had taken advantage of the portability of the MinION device to conduct sequencing experiments in multiple locations, from on a cliff-face to in a car ("all rental cars should come with a MinION, in case you want to sequence at the airport"). The MinION was implemented in their investigation of the effect of natural disasters on populations of jellyfish. Cheryl showed Buttonwood Sound, Key largo in the Florida Keys: the bay emptied in the aftermath of Hurricane Irma, and with it disappeared the Cassiopea jellyfish populations. She noted that even eight months after the emptying of the bay, only a few Cassiopea could be found.
To sequence jellyfish eDNA, sampling was performed at 7 collection sites: 2 protected from the hurricane, 4 exposed to the hurricane and 1 positive control from an aquarium. Water was filtered, the eDNA extracted and pooled. 16S amplification was performed and the samples sequenced in multiplex on the MinION, run from a battery. Metabarcoding and bioinformatic analysis were performed using guppy basecalling, Porechop and Cutadapt. The NCBI 16S database and proxy sequences and QIIME2 were then used, though Cheryl noted that the latter was designed for short-read microbiome data; Cheryl displayed the full eDNA metabarcoding analysis workflow, featuring taxonomic classification, diversity analysis and phylogenetic analysis. The results demonstrated rarefied alpha biodiversity, with 50 observed taxonomic units representing ~50 species of jellyfish in 4 classes. She noted that she wasn't surprised to detect species in protected areas ("we saw plenty of them there"), but jellyfish species were also detected in exposed oceanic areas, where they had not been visible. Phylogenetic analysis of the Florida Keys jellyfish again showed species representing all four classes; focusing on the Cassiopea taxa, Cheryl showed that the visibly-seen C. xamachana was abundant, but C. andromeda, which had not been spotted, was also detected. The positive control involved sampling of an aquarium housing a C. frondosa jellyfish, and this species was successfully identified; it was also identified at the Buttonwood Sound site much more than at the other five. Then focusing on the box jellyfish taxa, Cheryl showed that several species were represented quite well; she pointed out that the venomous box jellyfish was identified in sites used for diving training where they were not visibly seen - important data for avoiding stings.
Cheryl concluded that field-forward sequencing of eDNA with the MinION, operated via battery, enabled the successful detection of organisms. She highlighted that the method is "approaching real-time assessment" in even austere environments and could be modified to suit any system.
Emma Langan - Ship-Seq: nanopore sequencing of polar microbes onboard research vessels
Watch Emma's talk
Emma Langan began her presentation describing her PhD project, which she began in October 2017, spending a year “learning the ways of the nanopore” and trying to extract DNA from phytoplankton. The major part of the presentation would cover Emma’s trip taking the MinION to see what she could do with polar microbes on board a research cruise vessel.
Emma explained that her core interest is in ocean phytoplankton, which are responsible for 50% of primary production across the globe, as well as being responsible for carbon cycling. In particular, a subset called diatoms have silica shells, meaning they are slightly heavier than other plankton and so sink to the bottom of the ocean when they die, taking carbon with them instead of releasing it back into the ecosystem.
Diatoms exist primarily in polar oceans, and they are very adaptable, preferring to live in niches that are not yet occupied by other species. Emma also described them as “very cosmopolitan”, possessing much more diversity between species than you might expect. Polar phytoplankton are the basis of polar food webs, and many organisms, up to large mammals such as polar bears, rely on their existence to survive. Polar phytoplankton also contribute disproportionately to primary production, producing a huge amount of chlorophyll across all seasons.
Despite their importance in the Earth’s ecosystem, very little is known about phytoplankton, with only two genomes having been sequenced and assembled – one alga and one diatom – and previous studies have not covered polar oceans.
Bearing this in mind, Emma posed her research questions: are populations of polar phytoplankton changing with climate change? Can we use genetics, instead of traditional microscopy, to identify which species are present in different locations? And what do polar phytoplankton have that allows them to exist in such extreme environments where other species can’t?
Emma aimed to work towards answering some of these questions on her research cruise, trying to capture species which can’t be cultured or stored to be brought back to institutions for further analyses.
Taking a look at her methods, Emma detailed the reasons why they chose to use MinION to investigate these questions. Some species of phytoplankton simply don’t survive in culture or in storage, and DNA degrades over time, so on-site analysis was a must. In addition, via other methods it can take months to get data back to work with; Emma described how another group on board her vessel in January are still yet to have actionable data in late May. By contrast, using MinION allowed Emma to have data to work with within hours of beginning sequencing. This meant data could be used to make informed decisions; for example if they needed to return to a sampling site if the sample contained species that need further investigation.
Finally, Emma explained that short reads do not return good metagenomic assemblies, and if their project was to try and increase the number of phytoplankton genomes in reference then long reads would be essential. Summarising, Emma showed a picture of herself on board the Discovery in February, saying “Take your MinION, a couple of computers, and you’re good to go!”.
Moving on to the results of her study so far, Emma described how the team took samples at 12 stations, and she went on to sequence 3 of those samples whilst still on board the ship. Moving past “statutory penguin photos”, she detailed how a CTD was used to obtain 100 litres of water per station, as to obtain a microgram of DNA requires the filtration of tens of litres of water. In fact, this step was a real bottleneck for Emma and the team, as it took some ten hours to filter all the water they required.
DNA extraction was performed with phenol chloroform, which is incredibly toxic, but works effectively to remove DNA from tough polar phytoplankton. The sequencing library preparation was done with the ligation 109 kit, before analysis with NanoOK-RT, a real time version of Richard Leggett’s algorithm that has been used everywhere from this expedition to characterising the microbiome of preterm infants.
The average read length obtained was 1 kb, which, considering the best obtained read length in a lab in optimal conditions was 10 kb, was a good achievement in-field on a moving ship. There was enough sample to do later confirmatory sequencing, and 4.63 Gb data was generated across the sequencing runs. Emma mentioned this was “nothing special”, but the circuitous route to the Falklands may have impacted reagents and performance!
Highlighting some examples in the results, Emma showed their detection of Emiliana huxleyi in their northernmost sample, a microbe that had been reported as encroaching on the Southern ocean – potentially confirming that report. Detection of Phaeocystis was also exciting to Emma and the team, as it smells like sulphur, so they knew it was there before seeing its appearance in the sequence data! Emma also found some diatoms; lots of different species in low amounts, so not enough to assemble, but enough to confirm their highly diverse presence in the water.
Concluding her presentation, Emma explained the limitations of her work, most prominently taking dangerous phenol chloroform aboard a ship (“not health and safety ideal!”), and the high proportion of unclassified organisms in the study reflecting the lack of sequenced genomes in databases.
The future direction of the project is to continue confirmatory sequencing and do further analysis using metadata from the ship, for example temperature or nutrient level information, and going on additional research cruises to continue sampling. Summarising, Emma stated that phytoplankton are important, and polar phytoplankton even more so. We don’t know much about phytoplankton, and even less about polar phytoplankton, so there is much to be done. MinION should be a good way to find out more, coupled with better extraction methods and better reference databases.
Plenary: Jeroen de Ridder - Cyclomics: ultra-sensitive nanopore sequencing of cell free tumour DNA
Watch Jeroen's talk
Jeroen de Ridder took the stage for the evening session, introducing his efforts to create an ultra-sensitive test for cell-free DNA (cfDNA). This test is in development in order to address two very important challenges in cancer diagnostics. The first of these is treatment response monitoring; allowing data to influence decisions on whether to pursue treatment, switch to an alternative, escalate the scale of treatment or bring the levels down. The second challenge is that of recurrence monitoring – being able to get an early indication of disease recurrence would give the opportunity to restart treatment as soon as a recurrence is detected.
For these purposes, Jeroen describes current diagnostic methods, such as needle biopsies and MRI scans, aren’t fit for purpose, so liquid biopsies present the only reasonable avenue for solving these problems. This means then that Jeroen and the Cyclomics team have focussed their efforts on cell-free DNA. When cells die from apoptosis, they shed their DNA into the blood stream. This is not limited to healthy cells alone though, so cell-free tumour DNA contain mutations that can indicate the presence of that tumour.
At its most simple, Jeroen likens this challenge to an extremely complex game of “Where’s Waldo?”, as cfDNA molecules are extremely small, and only a small fraction of the total molecules present in a sample will actually contain the mutation. In one vial of blood then, there could be between only 10 and 1000 molecules with the mutation actually in. This really highlights the need for something very sensitive and fast to find these mutations.
So, Jeroen asked, how do we solve this very complex game of “Where’s Waldo?”. Introducing his young daughter holding a MinION, Jeroen explained that the solution also needs to be simple and ideally cost effective if it is to be the future of detection. This idea lead the Cyclomics team to begin developing an assay based on the MinION to fulfil that purpose, the method for which is called Cyclomics-seq.
Cyclomics-seq works on an enrichment strategy, capturing short molecules into circular molecules via the use of a backbone and rolling circle amplification, giving multiple copies of the original molecule in long stretches. These molecules can then be prepared for sequencing on the MinION, circumventing any random sequencing error by producing a per-molecule consensus sequence. Jeroen explained how this would create a sensitive, fast and flexible workflow that allows him to tell if a mutation is present in the patient molecule in an accurate fashion.
The bioinformatics part of the workflow consists of mapping with LAST-split and consensus calling with DAGCON followed by mutation detection by mapping with BWA-MEM and allele frequency detection with Sambamba.
In their initial results, the read length distribution showed a high number of reads over the 5 kb mark, indicating the majority of reads contained ten or more copies of the original insert of 400 bp. Some reads though, Jeroen went on to explain, had mapping gaps and didn’t align as well to the reference as the team would like. This meant that the assay underwent several rounds of development, and Jeroen demonstrated the results of this development with progressive graphs of the correlation between backbone and insert length. Data on the diagonal represented usable data, and the accumulation of points got stronger per iteration. Improvements came from several avenues, including new backbone designs and new enzymes between revisions.
To evaluate the performance of the assay, Jeroen and team mapped the median false positive rate versus the number of repeats in the molecule, finding that the false positive rate levelled off after 10 repeats. The false positive rate, although small, remained consistent at top end of the repeat number distribution, indicating that there were some errors still remaining that couldn’t be eliminated via consensus pile up.
Taking their developed assay to proof-of-concept stage, Jeroen and the Cyclomics team took pools which were 100% wild type, 100% mutant, and various dilution stages in between, sequenced them, and mapped the base called against the reference position at the known mutant point. At 100% wildtype, only a “handful” of reads displayed the wrong base, and the same was true of the 100% mutant pool. In the diluted samples the ratio of mutant to wildtype base diverges, meaning the mutant molecules are still distinguishable against a significant wildtype background.
The dilution series success gave Jeroen the confidence to move on to testing on clinical samples. Recurrence, he described, is a major problem in head and neck cancer where, even after a seemingly successful treatment, 50% of patients will experience a relapse. Head and neck cancers are also often extremely hard to image, meaning by the time a tumour is detected on an MRI scan it is often too late to tackle it effectively. Again, this points to liquid biopsy as the only reasonable way forward for early detection of recurrence.
In terms of the target for the liquid biopsy, Jeroen explained that 72% of all cancer patients possess mutation in the tumour suppressor gene TP53, and if you look at head and neck cancer patients only this rises to 90%, making it a very logical choice to focus on. The first port of call was to work on addressing the false positive rate mentioned earlier in the more generalised proof of concept. Initial results on TP53 showed that one third of the reads contained no observable error over the target stretches, but rebasecalling with the high accuracy guppy version resulted in the number of regions with elevated error dropping significantly, opening up the possibility of including those regions in the final assay.
Addressing this further, the Cyclomics team has received some early access flow cells of the new pore, R10, which they also used to evaluate the false positive rates in TP53. The results, Jeroen said, looked “really quite promising”, as far more areas contained no observable errors, meaning more loci that could be included in the assay. There were still some problematic areas in the R10 data, but interestingly, when Jeroen mapped the erroneous areas for R10 against the tricky areas in R9.4, the error profiles did not overlap, meaning different areas were high accuracy. This throws up all sorts of interesting possibilities for use of the two pores – be that selecting the right flow cell for the target of the assay, or combining the two types of data, or another avenue.
Following the development and testing stages, the Cyclomics team set up for clinical trial with 30 early stage and 20 recurrence patients. The question they wanted to answer, Jeroen explained, was whether the Cyclomics protocol allowed the team to make better decisions on treatment regime than the normal monitoring process. The trial began late in 2018, so is mid-progress at the moment, allowing only data for the first two patients to be shared.
The first patient example Jeroen gave presented with a stage II tumour and had a mutation in TP53 at 60% prevalence. They had undergone radiation and chemotherapy, and although the MRI scans showed no visible difference in the first few weeks post-treatment, 9 months later the patient displayed residual disease at lymph nodes. The team obtained blood samples pre-treatment, and at week 2 and 5 post treatment. After including controls of healthy subjects, the preliminary results showed that the mutant levels in the blood dropped in post treatment, only to rise again by week 5, indicating there were some residual disease present very early in the process. This example gave what Jeroen described as a “fine-grain look at treatment response” that correlated with the eventual MRI visual phenotype.
The second patient example consisted of a presentation of a stage IV tumour, meaning data could only be obtained at the pretreatment phase. The aim for this data was to demonstrate that the relevant mutation could be clearly identified, and quickly, so Jeroen plotted the sequencing results as a function of sequencing time. This graph showed that within 5-7 hours, the team could see ample evidence for the presence of problematic mutations in the patient’s blood.
Based on these early examples, the Cyclomics team are optimistic that the process they develop can allow much more rapid interventions into treatment cycles by acting on mutation data on a quick turnaround.
To conclude his talk, Jeroen highlighted some of the methods for improvement he is pursuing right now. Referring back to the improvement basecalling with the new guppy basecaller made to the false positive rate, Jeroen explained that this gives him a clear conclusion – all the information must be in the signal, and it is the interpretation of that data that makes a crucial difference.
This lead the team to develop an approach using dynamic time warping on the backbone signal and insert signals, and applying deep learning to that data to assess whether a mutation is present directly from the signal, instead of passing via a basecalling step. The accuracy of detection across training epochs increased significantly, illustrating clearly that training and optimising for a particular target can massively increase algorithmic capability. The first results of using the algorithm were very promising, clearly showing the difference between wildtype and mutant reads at very high accuracy, and the hope is that these deep learning approaches will be utilised in the Cyclomics-seq pipeline very soon.
Finally, Jeroen described his overall vision – that Cyclomics-seq should reach patients – and explained that Cyclomics was now a newly formed spinout company who aim to market an ultra-flexible blood test for TP53. Subsequently, this could be expanded to additional fundamental gene targets, but the team are also actively pursing widening their approach to a whole-genome method that could incorporate base modification analysis too. Overall, the ambition is to be the first liquid biopsy company based on a third generation sequencing platform.