London Calling 2019 day 3: populations-scale SV with PromethION, forensics, direct RNA, more

The final day of London Calling saw some impressive science in compact formats - read the plenary write ups below, or head over to the Nanopore Community to see the full live blog!

Min Sung Park: New revolution in genome science & genome medicine

Opening the third and final day of London Calling 2019, Min Sung Park from GrandOmics Institute in China, provided a fascinating overview of his team’s high-throughput nanopore sequencing capabilities and, more specifically, their research into human structural variation (SV) as part of the dbSV-1k project.

Min opened his presentation by stating that we are now in the midst of a ‘new revolution in genome sequencing’. A claim he backed up by citing the emergence of nanopore technology, the decrease in genome sequencing costs, and the increase in computational power. Enabling this revolution is the emergence of high-speed data networks such as 5G, big data management initiatives, machine learning, and many more technological advances. Min also cited the key role played by a number of laboratories and, perhaps most importantly, the nanopore user community in making this revolution a reality.

GrandOmics was founded in 2014 by Wonder (Depeng Wang) with a vision of pulling together these technological advances to become an advanced genomics solution provider and research organisation. It now employs over 250 people who, as stated by Min, are ‘working diligently to change the world’. Min explained how the company has rapidly become one of the most experienced, high-throughput users of nanopore technology, having run over 4,100 GridION Flow Cells and, in less than a year, over 3,300 PromethION Flow Cells. Astoundingly, Min revealed that the company has assembled over 600 animal and plant genomes since 2017, in addition to analysing over 100 transcriptomes.

One of the core capabilities of GrandOmics is the generation of ultra-long sequencing reads. Min showed data revealing that their ultra-long read approach is able to routinely deliver large plant genomes with contig N50s in excess of 2 Mb, and even up to 5.72 Mb. To support this work, they developed the genome assembler NextDeNovo, which allows the generation of highly accurate genome assemblies in a fraction of the time required for alternative approaches. Using the example of a 166 Mb genome, NextDeNovo doubled the contig N50 achieved using the Canu assembler to 4.9 Mb. They have used this tool to assemble some truly massive genomes, including one, which at 39 Gb, is over ten times the size of the human genome. Min described how running these large genome projects on the PromethION delivers astronomical amounts of data, requiring significant expertise in data management. To manage their data storage demands they have developed a cloud-based data storage system which allows flawless transfer of data.

Having described the company’s capabilities, Min moved on to discuss some of the research that they have undertaken. He stated that while they are a service provider, research is also at the heart of their values. A key area of research is deciphering SV variation and Min described how they have just completed the dbSV-1k project. This study, to characterise 1,000 human samples of various phenotypes, was undertaken last year using their resource of over 20 GridION devices, and more latterly their PromethION devices. Examining the data, they found approximately 20-25k new SVs for each individual studied. Furthermore, 2-3k new SVs were identified between different closely related ethnic groups. Looking at a chromosome level, the team also discovered very high SV density at telomeric and centromeric regions.

In closing his presentation, Min mentioned how they are now working with Oxford Nanopore on the dbSV-100k project, an ambitious project with the aim of sequencing 100,000 human genomes by 2021, in order to further elucidate the role of SV in human disease. To support this effort, just last week, they received delivery of another PromethION device. Combining this dbSV-100k project with the 1,106 customer projects currently in progress at the company, 2019 promises to be a busy and exciting year.

Spotlight session: pitches and winner Rebecca Richards

This year, London Calling features its first Spotlight session, providing a platform for early career scientists. Leila Luheshi (Oxford Nanopore Technologies) described how the format works: first, each Spotlight speaker gives a two minute pitch, providing a sneak peek for their full talk. After this, delegates get the chance to vote for their favourite pitch: all Spotlight speakers will deliver their full talks at the conference, but the winner of the poll delivers theirs straight after voting closes.

The first two-minute pitch came from Irina Chelysheva (University of Hamburg), who is new to the Nanopore community, having started working with nanopore sequencing only several months ago. She asked: why use nanopore technology to sequence tRNA? Why use a technology capable of generating ultra-long reads to sequence such short molecules? To explain, she asked delegates to imagine a nanopore sequencing device as a cell phone, in which DNA sequencing is equivalent to making a phone call, RNA sequencing is leaving a message - and tRNA sequencing is using Twitter. Whilst some may have been unconvinced of such novel applications initially - like the integration of the first, low-quality cameras into phones - Irina hopes to convince us that such new uses of technology are worth developing and optimising. Quoting Gordon's opening talk of the day, she added that "two heads are better than one", and 600 are better still - she hopes that as a new nanopore user, she can discuss this exciting application with the many attendees of London Calling.]

Following Irina was Rebecca Richards (University of Auckland), who asked us to imagine that we were "from a small town in the middle of New Zealand, also known as Hobbiton." One day, we find a shiny gold ring, which we take home - but one night, it is stolen. However, the thief cut themselves in the process, leaving behind biological evidence. Rebecca then asked: what if we could test the sample to reveal the hair colour, eye colour, age and much more about the thief, in a matter of hours and without having to send off samples for testing? Rebecca's talk discusses the potential of nanopore sequencing for use in forensic DNA profiling. She concluded: "if you want to find out who stole the golden ring, pick me."

Last but not least was Samuel O'Donnell, who noted that he was the second delegate hailing from New Zealand (a cheer from the audience at this point prompted him to up this number to three). Samuel introduced his project investigating structural diversity in budding yeast, a project which he hopes will help to "shift the paradigm" of the use of single, haploid reference sequences to that of reference assembly panels - a process which, he stressed, is uncertain to him too. His work involves investigating structural variation in Saccharomyces cerevisiae through the use of long read nanopore sequencing and de novo assembly. His talk also introduces MUM&Co, a tool for the easy and highly accurate detection of structural variants.

Then, it was time for voting: delegates had 60 seconds to vote for their favourite pitch, and as the votes flew in, one winner emerged.

And the winner was...

... Rebecca Richards! Rebecca returned to the stage to deliver her full Spotlight talk.

Rebecca Richards - Biological evidence of the future: the use of sequencing in forensic DNA analysis
Watch Rebecca's talk

Rebecca began her Spotlight talk by introducing the currently-used routine method of forensic DNA profiling: short tandem repeat (STR) analysis. Traditionally, this is achieved by amplification of specific alleles, then a comparison of their length to a reference DNA profile via capillary electrophoresis. However, forensic DNA profiling has its limitations. Firstly, Rebecca asked: what do we do if no link is found? Secondly, the low resolution of STR analysis means that the data may link biological evidence to multiple individuals. This can occur in the case of identical twins; it may also result from partial crime scene profiles which produce links to many individuals. Also, Y-STR profiling for all paternal relatives is the same, resulting in matches for all males in a family. Finally, samples containing a mixture of DNA from several individuals are complicated to deconvolute.

Rebecca described how next-generation sequencing, the decreasing costs involved and commercial availability of forensics-specific kits could tackle these problems. Sequencing of the current STR markers means that analysis is "no longer limited to just the length-based information" for alleles: more differences between individuals can be found from sequence information. Rebecca gave an example in which traditional analysis of STR markers identified the same allele, 10, at the TPOX locus in two individuals, whereas DNA sequencing of the same locus revealed two different subvariants, 10a and 10b: the extra information from sequencing enables increased discrimination between individuals. In a second example, this improved distinction between variants helped to improve deconvolution of DNA from different individuals within a mixture. Rebecca noted the many forms of information possible from sequencing, such as SNPs RNA and DNA methylation, potentially bringing a "whole new realm" of forensic analysis. SNPs, she described, provide much more phenotypic information for profiling, including hair colour, bio-geographic ancestry and even age. Furthermore, it could help in identification of other species, useful in applications such as wildlife forensics for investigating illegal wildlife trade.

Here, Rebecca pointed out that, unlike on CSI, forensic labs tend to be "really slow" at taking up new technology: only a few forensic labs worldwide are currently implementing next generation sequencing. Rebecca then introduced "what we're all here for: what about the MinION?" Its portability and real-time sequencing and analysis could allow for on-site analysis of biological evidence, and reduce turnaround time and costs; Rebecca noted several studies into how the MinION could be used in forensic analysis. Rebecca and her team are currently researching and optimising a workflow for the use of the MinION in forensic analysis, with the aim of maximising DNA recovery, accuracy and ease-of-use, with minimal time and cost. Rebecca asked: could next-generation sequencing provide the "biological evidence of the future?". She stressed the importance of the ethical and legal considerations of DNA evidence and that of ensuring good data security.

Rebecca concluded by demonstrating how DNA evidence could aid the investigation of "who stole The Ring?": with the help of sequencing analysis indicating phenotype, she ruled out other Middle Earth suspects until one culprit remained. Mystery solved: it was Samwise Gamgee.

Plenary: Yue Wan - Using direct RNA sequencing to detect RNA structures in transcriptomes

Yue Wan, from the Genome Institute of Singapore, gave a plenary talk about using artificial modification of RNA molecules to help determine their structure. Starting, Yue talked about how she had heard lots over the course of the conference about isoforms and differential expression but would like to talk about how there is more information to be gained from an RNA molecule. She said she was interested in how RNAs work and how it exists beyond its linear form and can fold into secondary and tertiary structures which affect its function. Giving an example of a riboswitch, Yue explained how changes in the shape of an RNA molecule can be used by cells to, for example, measure and respond to changes in concentration of a substrate.

Traditionally, the structures of RNA molecules have been determined through the cloning of small fragments and cutting with enzymes at single and double stranded regions before running gel-based assays to gain information about the molecules secondary and tertiary structure. However, these methods are complex and take a long time, and furthermore they typically resolve fragments of around 100 bases whereas a typical human transcript is around 2kb.

Moving on, Yue spoke about the advent of structural determination using high throughput short-read sequencing allowing the profiling of thousands of transcripts simultaneously. Since then, labs around the world have used the method they developed to profile RNA structures from a wide array of organisms. However, short read sequencing has some of the same draw backs as the gel-based assays, namely that structures cannot be determined across long RNA molecules preventing isoform specific structures from being elucidated. The issue with this is that RNA does not exist in the same state throughout its life cycle and bulk sequencing only provides an average structure of the molecule pool. Furthermore, this process takes a long time and there are many known biases.

Yue then explained how direct RNA sequencing could help with this. Using a number of different chemicals, individual bases can be synthetically modified if in a single stranded state. For example, specific chemicals, such as DNS or CMCT, modify the residues of A and C bases or C and U bases respectively, but only when in a single stranded state.

Yue said that she and her team wanted to screen a number of these chemical modifiers but needed a positive control of known structure to do so. Introducing the highly studied teterohymena ribosome group 2 intron, Yue said that the molecule was folded and run through the nanopore platform in a modified and unmodified state. As an initial examination of the data, they hypothesised that, due to the incorporation of modifications, the general error profile of the sequence should increase compared with the unmodified control. Across all the compounds tested, this was the case. However, when the locations of these errors were investigated, it transpired that they were not localised to the expected places and the best accuracy obtainable was 0.6. Using the mean and standard deviation of the raw current data, a modification profile could be generated along the length of the transcript. When these values were plotted for locations that should be modified, a clear separation could be seen in the modified RNA when compared with the unmodified control. Modification information was plotted along the length of the molecule and clear spikes could be seen in the modified RNA molecules at the expected locations. Furthermore, as an extra control, the RNA molecule was denatured, modified, and sequenced showing a raised but uniform signature suggesting artificial modification occurred across the length of the molecule. Yue went on to describe how this method could be benchmarked against traditional foot printing methods using a real biological system. The 16S rRNA molecule from B. subtillus was used and a good concordance was seen with gel-based assays.

Yue then stated that RNA molecules do not exist in static states, for example a TPP riboswitch changes shape upon binding to a substrate. Using this method, the change in shape could be seen and furthermore the strength of the signal was dependant on the concentration of the substrate suggesting the method was quantitative.

Looking at whole transcriptome information from an embryonic stem cell line, Yue sequences 15 unmodified and 20 modified samples. The reproducibility between technical replicates was high suggesting noise in the system was low. Here known structural features in the human transcriptome could be, for example in the 5’ UTR region of specific transcripts.

In the next section of her talk, Yue spoke about using this method to discriminate between different isoforms. As long read sequencing allows reads to span the entire transcript structures can be phased to aid in the understanding of connectivity between transcripts. So, how important is structure in relation to function? Finally, Yue explained how cyclohexamide could be used to pause ribosomal translation in order to see different translational statuses between isoforms. Here greater structural differences were related to greater translational differences.

Plenary: Rachael Tarlinton: Retroviral invasion of the Koala genome
Watch Rachael's talk

Rachael opened her talk by outlining koala ecology and natural history. Koalas are restricted to the eastern seaboard of Australia, primarily because this is where their food is located. They spend most of their time up large trees and they are fussy eaters, only eating eucalyptus leaves - which are actually very toxic, and therefore koalas require specialised gut microflora for their digestion. Young koalas are born very immature and, as marsupials, the last two-thirds of their gestation occurs in the mother's pouch. Rachael explained that there are two key considerations about koala ecology and natural history for this talk - firstly, the fact that they spend most of their lives up tall trees means that they are difficult to work with scientifically. Secondly, Southern koalas were almost hunted to extinction in the late 1800s/early 1900s, and were then restocked from small island populations, meaning that the population became, and still is, very genetically restricted.

In terms of koala conservation, Rachael described how we face a number of problems for its successful implementation. For starters, even the number of koalas is disputed, with a wide predicted range between 40,000 and 400,000; even so, and despite past intensive hunting, these numbers are "really drastically reduced compared to where they were". As a result, there is a big argument over the ICUN protected status. The biggest threat to the population is habitat loss due to deforestation for house building in urban areas. Rachael further explained how the species distribution is unbalanced, with the Northern populations doing less well in terms of numbers compared to the Southern population. With overabundance in some areas, such as Kangaroo island, culling and desexing of koalas is often performed to manage overbreeding and population numbers. Other major causes of their death include cars, dogs, and disease.

Focusing on koala diseases, there is a high incidence of Chlamydia infection in the population. As urinogenitary tract infections are one result of this disease, this impacts their ability to breed. Another result is eye infection, and if they cannot see then they cannot eat. However, even though there is a high incidence of disease, actual clinical, symptomatic disease is not always present, as opposed to an asymptomatic carrier status. What is the difference between those koalas which contract clinical disease versus those that remain asymptomatic?

There is also very high incidence of leukaemia and lymphoma in koalas - up to 40% in captive koalas, compared to an average incidence of 1% in wild koalas. This is typically a retroviral disease, whereby leukaemia/lymphoma is thought to result from retroviral-mediated immunosuppression. Yet Rachael explained how "retroviruses are strange creatures"; the ones that infect koalas are different to the ones that we are more familiar with, like HIV. Retroviruses integrate into the host cell's genome and if these cells are germ cells, then they become inheritable. About 8% of the human genome consists of integrated ancient retroviruses. Rachael said that it is not great having them "hopping around", causing cancer, so the host genome has a lot of mechanisms to inactivate them. In the long term, retrotransposition usually gets shut down by mutation, and the virus no longer functions as an infectious virus.

Koala retrovirus (KoRV) is an unusual endogenous (inherited) retrovirus which is able to integrate into the koala genome. KoRV is still a relatively new virus - it is a new entrant into the koala genome (originally integrating <49,000 years ago), and it is highly polymorphic. It is very similar to pathogenic exogenous viruses that infect Gibbons (GaLV); Rebecca described how KoRV was transferred to Gibbons by accident in the laboratory from infected rats. Similar retroviruses have been found in other animal species, such as in flying foxes, and other animals which are generally unrelated to each other; it is not known how these viruses are able to transfer between each other.

Rachael described the varying prevalence of KoRV in koala populations - there is 100% prevalence in Northern koalas, yet not all koalas in Southern Australia are infected. In fact, there is a gradient of decreasing prevalence from the North to South, which looks much like the spread of an infectious disease. Rachael asked - is this variability due to different strains infecting the populations? Also, how do we vaccinate against it? Delivery of a vaccine is "not a very practical solution for a wildlife problem", as the koalas live high up in trees.

There are pathology differences between the Southern Australian and Queensland populations. Much higher KoRV loads are presnt in the North compared to the South; disease is associated with decreased load. The disease oxalate nephrosis is only detected in Southern koalas, probably due to to genetic restriction as opposed to viral loads. On the other hand, neoplasias and chlamydia are far more common in Northern animals. Rachael described how short-read sequencing of koala populations identified the presence of KoRV in the Northern koalas, yet identified that Southern koalas also had the virus. However, the sequencing reads were only mapping to the viral LTRs, which are present at either ends of the viral genome, meaning that the middle section of the viral genome was missing. Rachael explained that this has important implications for viral biology - having a mutated or truncated form of a retrovirus can protect from the infectious form. For example, if the env genes are present and expressed, viral envelope proteins can bind to cell receptors and prevent the infectious form entering the host cell. She stated how therefore there could be positive selection for the truncated form of KoRV for disease prevention.

Rachael next described her application of nanopore technology to sequence KoRV in the Southern koala population. This arose because sequencing across the entire viral genome was required, according to publication reviewers, to validate her findings and hypothesis, yet short sequencing reads had mapping problems and PCR was unsuccessful in amplifying a genome flanked by problematic repeats. Therefore, Rachael turned to long-read nanopore sequencing for resolution of the entire KoRV genome. Using the PromethION platform, one Southern koala genome has been resequenced; even though this koala had in fact resided in Longleat, UK, it was originally from the Southern Australian population. Sequencing of this koala's genome on the PromethION was only achieved earlier this week! The run produced >3 million reads with read length N50 of >38 kb; from this, 94 reads matched the KoRV reference sequence AF151794. Such a low read number was expected as it is known that the retrovirus is present at low copy number in the genome. Rachael stated how, thanks to nanopore long-read sequencing, they managed to achieve their goal of sequencing across the entire retroviral genome. Both the full KoRV genome, as well as combined 5' and 3' LTRs, were identified in the sequencing data. Rachael also described how she would like to apply CRISPR targeting for KoRV sequence enrichment, followed by nanopore sequencing, which would susbtantially reduce the cost and time of sequencing compared to sequencing the whole koala genome.

Rachael concluded her talk by describing the impact of this research on koala conservation. As vaccination against KoRV is impractical, other solutions need to be used; if the truncated form of the virus does protect animals from disease, then selective breeding can be performed. She said that "this will be: watch this space!"