London Calling 2019 day 1: RNA base modification, haematology, mRNA in the human brain, more

It's been a fantastic and packed out first day in Old Billingsgate for London Calling! The summaries of the key plenary talks can be found below recounting all the action; keep track of everything going on tomorrow by following #nanoporeconf on Twitter or checking out the live blog in the Nanopore Community.

Eva Maria Novoa - Accurate detection of m6A RNA modifications in native RNA sequences using third-generation sequencing
Watch Eva Maria's talk

Opening this year’s London Calling conference, Eva Maria Novoa from the Center for Genomic Regulation in Spain, provided an update on her team’s work utilising nanopore sequencing to directly detect base modifications in native RNA molecules. Eva Maria started her presentation highlighting how the process of protein generation – from transcription of DNA to RNA, and subsequent translation of RNA to protein – is much more complex than we were taught at school. We now know that many more factors play a role in this process, requiring the study of the epigenome, epitranscriptome, and post-translational protein modifications. Furthermore, these factors are intrinsically linked, interacting with each other to add a further layer of complexity to functional studies.

Eva Maria showed a chart revealing that there have been significantly fewer studies of the translatome (i.e. all translated RNA molecules) than the transcriptome, which she puts down to the lack of suitable analysis technologies, rather than a lack of importance. It was originally thought that RNA modifications were a structural feature of tRNA or rRNA; however, in 2011, a publication revealed that the m6A modification, which was known to exist in mRNA was reversible. This led to the realisation that the modification may have functional properties, and, as a result, pushed the development of techniques to better analyse these modifications.

According to Eva Maria, in excess of 170 RNA different modifications have now been identified and, of these, over 70 have already been linked to human diseases, including cancer and neurological disorders. The first genome-wide method for the analysis of the modified base m6A, m6A-Seq, was published in 2012. Since this time there has been an exponential increase in the number of publications using this method and m6A has been shown to have a pivotal role in a range of cellular functions such as cell differentiation, stress response, mRNA stability, and sex determination. However, Eva Maria described how this method, which relies on traditional sequencing by synthesis (SBS) analysis, has a number of limitations for the detection of base modifications. For example, it requires the existence of selective antibodies or chemicals, which only exist for a handful of modification types and which may also exhibit cross reactivity. Further, the methodology is complex and only provides an indirect measure of modification state. It was her assertion that better methods were required. Fortuitously, a move to the Garvan Institute of Medical Research, brought her together with Martin Smith who was using nanopore sequencing for direct RNA. It was apparent that this technique may be able to solve many of the challenges faced by current technology. Eva Maria described how she was ‘super excited’ when Oxford Nanopore released the first direct RNA Sequencing Kit.

Unlike traditional sequencing technologies, nanopore RNA sequencing requires no amplification or reverse transcription steps, allowing the retention and detection of base modifications alongside the nucleotide sequence. As with standard nucleotides, base modifications give a measurable and characteristic disruption to the electrical current applied to the nanopore, allowing their direct detection and identification. However, Eva Maria revealed, it was not always straightforward to associate every single change in current intensity to the presence of a modification. The team quickly realised that they needed to create a training set to create a specific basecalling algorithm for modified bases. To this end, they designed sequences that covered all possible 5mers of this modified base. They sequenced modified and unmodified sequences to obtain a set of features, which they would ideally be able to classify using machine learning to their relevant modification states. This was benchmarked using m6A. When they mapped the subsequent reads back to the sequence set, they saw a large number of sequencing errors in the m6A modified sequences. They realised that these base calling errors could be used, in addition to current intensity to improve the identification of modifications. The revised algorithm, delivered a 90% accuracy for calling m6A. The algorithm was then validated on wild type yeast and an Ime4 knockout strain, which lacks m6A. The data showed that, as anticipated, the basecalling features were changing in the wild type due to the presence of m6A but not the knockout strain – allowing single molecule resolution detection of the modification. Inspired by these results they are now building data sets for other base modifications, including 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), and pseudouridine (pU).

In collaboration with Martin Smith at the Garvan Institute of Medical Research, the team are also developing a barcoding methodology to further reduce sequencing costs, for which initial results have yielded an accuracy of 98.9% with an 80% recovery. The methodology and data behind this work will be released shortly. According to Eva Maria ‘The establishment of the Oxford Nanopore platform as a tool to map virtually any given modification will allow us to query the epitranscriptome in ways that, until now, had not been possible’.

Nicola Hall - Revealing mRNA alternative splicing complexity in the human brain
Watch Nicola's talk

In the second plenary talk of London Calling 2019, Nicola Hall (Department of Psychiatry, University of Oxford) began: "what's going on in your brain right now?". The activity going on that Nicola is interested in is: splicing. The study of splicing in the brain sheds light on how genes, and so the whole brain, is functioning, and helps in the investigation of psychiatric disorders.

Calcium signalling, Nicola explained, is essential for neurotransmission; Nicola showed the role of voltage-gated calcium channels in the tightly-regulated process of the passage of calcium ions through membranes. Voltage-gated calcium channels are important in the cardiovascular system, as well as in the brain. Loci within the gene CACNA1C, which encodes this protein, have been robustly linked to psychiatric disorders including bipolar disorder and schizophrenia. There are existing drugs which target the protein: Nicola quoted a study which found that treatment of patients with blockers reduced hospitalisation cases and self-harm.  CACNA1C produces multiple protein isoforms; differences in splicing impact the proteins produced, which in turn affect drug binding. To investigate the variation in the isoforms produced, Nicola stressed the importance of determining how different parts of the CACNA1C gene work together. To fully characterise splice isoforms of CACNA1C, the team decided to use long-range, targeted cDNA sequencing of full-length transcripts.

RNA was extracted from human brain samples, reverse transcribed and the long-read cDNA sequenced on the MinION device; 6 regions of the brain were sampled from each of the three tested. Data analysis was carried out via two pipelines. First, exon-level analysis was performed, in which reads were mapped to known and novel exons and their abundance compared. This analysis revealed that splicing in CACNA1C is "much more diverse than was previously known": 38 novel exons, 7 known, previously-annotated isoforms and 83 high-confidence novel isoforms were detected. 9 out 10 of the most abundant isoforms identified were novel, and 8 were predicted to encode functional channels.

Nicola showed a principal component analysis plot in which the isoforms detected in each region of the brain sampled were plotted. The results revealed that there is more diversity between splicing in different regions of the brain than there is between individuals, which is a promising result for future potential treatments, and suggests that splicing is regulated according to cell type and function.

Next, splice site-level analysis was performed, identifying canonical splice junctions and mapping reads to these. This method is able to detect small-scale variations, though quantitation is less reliable. This identified 195 high-confidence isoforms, of which only three had been previously annotated. The results again indicated that splicing diversity was much higher than expected in CACNA1C, and that more diversity is seen between different regions of the brain than between individuals. It also showed many "common themes" in splicing events, which frequently occurred in the same regions, though rarely in domains critical to structure, suggesting that the process is not completely random. The analysis was able to detect variations as small as a single amino acid codon, and some of the identified variations were quite abundant - one was seen in almost half of the samples tested.

Nicola then asked, what are the other sources of transcript variation? One source could be alternative start sites. 5'RACE (rapid amplification of cDNA ends) analysis identified two transcription start sites in the human brain: one known, one currently not annotated and predicted to encode a functional channel with a truncated N-terminus.

Nicola described how therapeutics would need to target brain-specific isoforms of CACNA1C to avoid off-target effects. To investigate this, the team are identifying brain-enriched isoforms that are absent in heart samples. As they do not currently have matched human heart and brain samples, their initial investigation uses samples from mice. Again, "huge diversity" was seen in splice isoforms, many of which were novel. Nicola showed data that demonstrated the clear distinction between isoforms present in the brain and cardiovascular system in mice, indicating that "splicing seems to be strongly tissue-specific." An alternative start site was also detected in the heart tissue. Isoforms are not conserved between humans and mice, so many splicing patterns were found in mice but not humans and vice versa; however, the team expect that the principle of tissue-specific splicing will be conserved.

Nicola concluded that the human CACNA1C gene encodes a wide variety of novel putative voltage gated calcium channels. In mice, the CACNA1C isoform profile differs between the brain and cardiovascular tissue, indicating the potential for identification of brain-enriched isoforms with different drug-binding capacity to cardiovascular isoforms which could represent future therapeutic targets. The team now plan to characterise the isoforms of CACNA1C in the brain and cardiovascular tissue of humans, and to carry out functional assays to see how the splicing observed affects protein function.

Anna Schuh: Application of nanopore sequencing in clinical haematology
Watch Anna's talk

"Blood cancers are the fifth most common cancer worldwide"; they are the most common cancer in young people and if left untreated they are fatal. Precision medicine has already been successfully applied in the field of haematology, meaning that we are becoming increasingly able to achieve a cure or long-term disease control. Six of the ten most lucrative drugs are for haematological malignancies, and it is thought that the size of the drug market for blood cancers will reach 55.6 billion USD by 2025.

Risk stratification of blood cancers requires a detection of multiple genomic abnormalities: single nucleotide variants (SNVs), chromosomal gains and losses, and translocations. It isn't a case of identifying single genetic change, but detecting a complex combination of changes " a much more difficult question". With 605 individual tests in the 2018 NHS England Cancer test directory, of which 177 are for haematological cancers, molecular classification of cancers is becoming increasingly relevant for diagnosis and prognosis.

In chronic myeloid leukaemia, characterised by the BCR-ABL gene fusion (the Philadelphia chromosome), precision medicine has helped to increase the long-term survival rate with the development of tyrosine kinase inhibitors against this translocation.

In acute myeloid leukaemia, improvements in patient survival over the last 50 years have been due to better patient stratification for treatment, and supportive care. Our increasing understanding of gene fusions, acquired SNVs, and complex chromosomal abnormalities, involved in the disease course, has advanced our understanding of AML prognosis and helped to direct future therapeutic avenues - "all of these are important in developing therapy". Three drugs have been approved in the last couple of years by NICE; our increased understanding has therefore "revolutionised the therapy for AML".

Anna introduced how, in chronic lymphocytic leukaemia (CLL), current accredited technologies, such as microscopy, are inadequate for precision medicine as they cannot identify all the genetic abnormalities of interest - they have limited sensitivity, slow turnaround time, are labour intensive, and often require additional supportive tests. The advance of whole-genome sequencing (WGS) with Oxford Nanopore technology has enabled the detection of SNVs and large scale abnormalities in the same run. Using the MinION platform, amplicon and shallow WGS libraries can be combined on the same flow cell; IgHV status, TP53 mutations, del(17p) detection, and additional structural variants, can be detected simultaneously. Anna stated that "we love it" because nanopore sequencing has a rapid turnaround time, and requires no centralisation of services. Anna suggested that this approach should be applied to other leukaemias, including in low-to-middle income countries.

Anna next described the case of sickle cell disease - the most common monogenic disease worldwide. In the UK, risk of sickle cell disease is the most common reason for prenatal diagnosis. The disease can be identified using protein-based techniques, but these fail in~10% of cases in children >6 months old, and second confirmatory tests are required. The solution proposed by Anna is the detection of mutations and/or deletions in HBA1, HBA2 and HBB haemoglobin genes using the MinION platform - this would be fast, with a simple library preparation protocol, and a cloud-based analysis pipeline. CRISPR/Cas9 has been used for targeted mutation detection in the HBA1, HBA2 and HBBloci, with sequencing on the MinION and Flongle. Such targeted sequencing means that read depth is greatly enriched over target regions.

Anna suggested that sequencing of cell-free plasma DNA that circulates in the blood also has great potential in clinical research, and can be used for precise quantification of fetal allele fractions; this would reduce the need for invasive prenatal testing. It is already being used in the clinic for diagnosis of trisomy 21, and the primary aim would be to use cell-free DNA for cancer diagnosis. Anna described the case of Epstein Barr virus (EBV)-driven lymphomas in children; "95% of childhood lymphomas occur Africa". These lymphomas result from early infection with EBV, in combination with exposure to malaria. Treatment for the cancer is free, yet >90% of children currently die from it in low-to-middle income countries. In comparison, the cure rate in high-income countries is 90%. The main reason for treatment failure is due to no diagnosis or misdiagnosis, this is because diagnosis requires technical expertise, yet the training of professionals required for this is lacking. Anna suggested that nanopore sequencing technology could be applied to detect lymphoma-specific mutations in peripheral blood using liquid biopsies, and Nanopore sequencing platforms have great potential to make a significant impact in low-to-middle income countries.

View Anna's talk here:

Dan Turner: The real Simon Pure
Watch Dan's talk

Walking on to REM’s “Imitation to life” with a talk titled “The real Simon Pure” foreshadowed the focus of Dan’s presentation, that being the sequencing of native DNA on the Oxford Nanopore platform. However, before getting into the meat of his presentation, Dan opened by introducing the Applications team at Oxford Nanopore. Outlining the roles of the different groups across the globe, he said that the Oxford contingent specialised in sample prep technology and kits, whilst the American group were more focused on showcase projects demonstrating what Oxford Nanopore sequencing can do.

Moving on, Dan explained that much like the play “A bold stroke for a wife” from where his talk title originated, his presentation will focus on fakery, or more importantly, finding the real answers. Elaborating upon this, Dan gave a nod to the recent BioRxiv paper by Ebbert et al. titled “Systematic analysis of dark and camouflaged genes: disease-relevant genes hiding in plain sight” to highlight the fact that many areas of the genome are inaccessible to sequencing by amplification-based approaches. To demonstrate this point, Dan brought up a coverage plot of chromosome 21 as generated by Oxford Nanopore native DNA sequencing. Here, the coverage was relatively even across the whole 1 Mb region displayed. However, when a PCR version was overlaid, huge coverage drops were obvious, with some regions being completely missed. To back up his point further, Dan showed a histogram of read coverage by GC content across a whole human genome. Where PCR had been used, the GC content formed a normal distribution around 40%. However, regions which could only be sequenced using the native DNA, and not through PCR, had a bimodal GC content around the extremes. To give further examples, Dan brought up a wealth of scientific literature highlighting problems with sequencing genomes with known GC biases using amplification-based approaches. Specifically, Dan talked about one example where PCR had significantly shifted the measured GC content of a sequenced organism. Explaining why this happens, Dan said that PCR prefers GC-neutral areas of DNA and preferentially amplifies shorter DNA fragments. Whilst this is fine for targeted approaches, when examining a whole genome via amplification, these factors result in longer or GC rich PCR templates amplifying less efficiently - or not at all. In order to get as unbiased a representation of a whole genome as possible, sequencing the actual native DNA is required. As an additional point, any amplification-based approach will lose epigenetic markers as these are not preserved through PCR on to the copied strands. Dan reiterated that the Oxford Nanopore platforms are the only sequencing solution that allows the sequencing of DNA and RNA itself without the need to ever synthesise DNA copies. Therefore, this removes the potential biases introduced through loss of information via PCR drop-out, or removal of methylation signatures. Dan then went on to show examples of why this is such an important option to have in a researcher’s arsenal of molecular tools when attempting to answer an array of different biological questions.

DNA modifications, specifically methylation, became a recurring theme throughout Dan’s talk. With a brief introduction stating that “DNA methylation of cytosine residues in eukaryotes alters gene expression patterns” he moved on to show how this may be relevant in the debilitating inherited disease Friedreich’s Ataxia. Friedreich’s Ataxia is one of the most commonly inherited recessive neurodegenerative diseases, affecting 1 in 50,000 people, and resulting in loss of motor skills and eventually death. With approximately 1 in 112 people carrying a single copy of the disease allele, the inheritance patterns follow typical Mendelian genetics, with the offspring of two carriers having a 25% chance of inheriting the terminal disease. Two proposed mechanisms exist that result in loss of function of the gene encoding the frataxin protein, both involving the inhibition of RNA polymerase to process through the gene, resulting in a loss of transcription. The first involves a GAA repeat expansion between exons 1 and 2 that causes triplex DNA to form, where healthy individuals typically have less than 20 copies, and diseased individuals show in excess of 1000. The second involves hyper-methylation of the 2 kb running up to repeat expansion which further inhibits RNA polymerase procession. For this study, Cas9 was used to excise the frataxin gene and Oxford Nanopore long-read sequencing of the native DNA was successfully used to find this low complexity repeat expansion in parental carriers and their affected child. In the parents, the repeat expansion was observed on one allele, identifying them as carriers; the same repeat expansion was detected in the child, but in its homozygote form. When the number of repeats was calculated, simply by taking the number of bases in the low complexity region and dividing by three, the expansion count mirrored that elucidated by southern blot analysis. In terms of methylation, the 2 kb upstream of the repeat expansion was analysed for CpG methylation using Nanopolish; hypermethylation was observed, where reads containing longer GAA repeats showed more methylation than the wild type. Dan stressed that the sequence of interest contained long GAA repeats which would be difficult for polymerases to process through. Furthermore, the need to maintain the methylation patterns meant that sequencing the native DNA with Cas9 mediated targeting was the only way to get the results.

Dan followed this with a methylation story focusing on bacteria.  Bacterial DNA methylation is often used as a defence mechanism against invading bacteriophages by protecting the host DNA from restriction endonucleases while allowing the invading phage nucleic acids to be destroyed. However, in this section of his talk, Dan suggested that methylation patterns could be used as a way to cluster genomes of closely related organisms after the DNA has been co-extracted from a metagenomic pool. Here, two strains of E.coli were pooled where one had two methyl transferase genes knocked out. This strain was both DAM and DCM deficient meaning that only one of the strains had the ability to methylate DNA in a cCwgg and a gAtc context respectively, while the other did not.  DNA was extracted from this mixed population and Oxford Nanopore’s Tombo methylation caller was used to find methylated bases in the native DNA. Using the median DCM and DAM values for both genomes, each could be distinctly separated across the two dimensions. In order to prove this was in fact the case, sequences from each cluster were assembled and a mummer plot of one against the other showed a 2 kb insertion leading to the inactivation of the DAM gene. As an added bonus, plasmids associated with each organism also showed the expected methylation patterns and could therefore be linked to their organism of origin (poster:

Hinting at spoilers for a future talk, Dan spoke about another way to link plasmids to their host by using the newly developed chromatin capture method called MetaPore-C. Here, DNA is cross-linked with proteins within the cells themselves and the free DNA ends are cut and ligated together. As a result, DNA close together in 3D space becomes ligated together and sequenced in a single concatemer molecule. Here, plasmids and host genome would be physically linked via ligation and sequenced together allowing taxonomic information from the host genome to be assigned to each plasmid. Dan only briefly explained an ongoing project using this method to track antibiotic resistance plasmids through a bacterial population as a teaser before segueing onto how long-read Pore-C has other uses, for example aiding in the de novo assembly of whole genomes.

Using the same idea of cross-linking DNA in the host cells prior to cutting and ligating ends together, this application of Pore-C exploits the distance between reads in 3D space to aid assemblies, find copy number variations, and identify genomic rearrangements. The output of Pore-C is often displayed as a “contact map” where genome position is denoted on the X and Y axes, and pixel intensity represents read depth for a given section of genome. When this was performed on the well sequenced and studied human reference genome NA12878, the contact map mainly showed many reads map to locations that they are expected to, i.e. along the diagonal. Dan then showed a contact map for a breast cancer cell line. In this case, vertical and horizontal lines could be seen away from the diagonal and he said these were indicative of copy number changes. Furthermore, darker points located off the centre line suggested rearrangement events. Examining the contact map for NA12878, Pore-C was used in conjunction with 44 X coverage of standard Oxford Nanopore reads with a read N50 of 40 kb and 11 X coverage of ultra-long reads in excess of 100 kb. As an example of how Pore-C can be used to correct an assembly, the tool SALSA2 was used to find the most optimum assembly based upon the contact map data. Displaying the results for chromosome 8, contigs were merged, inverted or split based upon the Pore-C data resulting in a 129 Mb scaffold spanning 90% of the whole chromosome. The overall contig N50 of the human genome after correction with SALSA2 was 36.2 Mb resulting in potentially one of the most contiguous diploid human genomes to date. Summarising this section, Dan said that Pore-C can be used to verify and improve de novo assemblies and observe copy number changes and rearrangements in the contact map. Furthermore, this information can be used to scaffold contigs (poster:

Wrapping up his talk, Dan wanted to say that although his talk mainly focused on the benefits of native DNA sequencing, sometimes PCR is the best tool for the job. For example, often you will have limited DNA or a lot of background and are targeting a specific region of interest. However, it is only native nucleic acid sequencing on the Oxford Nanopore platform that can give you the lowest bias, the longest reads, and epigenetic modifications all in the same experiment.

Join us tomorrow for more updates!