Martin Smith: High-throughput targeted nanopore sequencing of single cells

Martin began by introducing the fact that him and his team have been doing nanopore sequencing for a while, encompassing applications in cancer genomics and epigenomics, transcriptomics and RNA modification detection, as well as a small amount in the production sequencing space.

For this talk though, Martin focussed in on single cell sequencing, which can be likened to population genomics for tissues rather than across a population of organisms. Lots of platforms, Martin explained, can do single cell analysis, but not many (if any) can do both UMIs alongside simultaneous generation of full-length transcripts. The single cell platforms Martin’s team use are droplet-based, as these have high throughput in comparison to plate-base or flow cytometry techniques. In particular, Martin and team use the 10X Chromium device to isolate individual cells into reaction droplets. The basic workflow is to homogenise the tissue sample, capture cells into droplets and perform reverse transcription on the beads, before amplification with PCR to generate a cDNA library.

Traditionally the next step would be to chop the cDNA up into fragments ready for short read sequencing in order to generate 3’ expression profiles, which can then be used for declustering.

The focus of doing this kind of analysis though, Martin explained, was to examine how the immune system of a patient keeps cancer in check, or what is it that causes cancers to evade the immune system? In order to understand oncoimmunology, Martin’s team have described the “holy trimmunity”; namely sequencing of tissue from the tumour, lymph node, and blood of a single cancer patient.

Each lymphocyte has a unique antigen receptor sequence, and these undergo massive somatic recombination, making it extremely unlikely that two cells have the exact same combination of exons. The primary limitation of short reads, Martin outlined, is that they only return one end of these molecules, only capturing the 5’ or 3’ (beginning or end) of the transcript.

The solution to this limitation though is perhaps nanopore sequencing, although the per read error rate must also be taken into account when pulling out short barcodes of approximately 16 nucelotides in length for accurate demultiplexing. Martin and the team at the Garvan Institute have developed a process called RAGE-seq for this purpose, in which a normal cDNA library is prepared with the droplet method and submitted for gene expression profiling. In parallel, though, capture probes have been designed to pull out sequences for T and B cell receptors, with which the short reads are combined to identify the barcodes accurately. The resulting contigs are assembled de novo, to find out which VDJ sequence is present.

This process was benchmarked with three well-characterised cell lines. Clustering plots show that short reads generate the correct proportions of sequences against the proportions of the cell lines put in, so they can be used effectively to pull out the corresponding sequences in the long read data.

Thusfar the process is slightly inefficient, giving 18% recovery of the barcodes, but this can be doubled to 40% with fuzzy matching allowing for a couple of mismatches. Martin explained that as per a previous, raw signal data will be used to demultiplex – but the audience should stay tuned for more progress on that.

The results of the benchmarking on well-characterised cell lines didn’t render a lot of full length RNAs, so some de novo assembly and polishing was required to generate contigs of the expected size of the full mRNA sequence.

These initial results show that 100 reads per molecule is enough to accurately call the clonotype in 50% of cases, and high accuracies of 98% can be generated from just 50 reads. Interestingly, the sequence for IGH and IGL displayed lower accuracy than those of the T cells, but this could be either sequencing artefact or just biology.

Examining the B cell receptor sequences in more detail revealed that the data looked fairly noisy, with lots of variation from the reference. Initially, this was thought to be error, however B cells undergo affinity maturation and somatic hypermutation to finetune antibody sequences so this may not be the case. By way of comparison, Martin performed the same analysis on the sequences of the T cells, which showed almost no mutations in the T cell receptor consensus, so it is likely that the B cells were indeed undergoing somatic hypermutation.

Segueing from here onto patient data, Martin used an example of data from a tumour draining lymph node, containing almost all classes of lymphocyte, where you can identify subpopulations of T cells. Even without looking at the 3’ profiling information, a T cell can be identified as naïve simply based on the number of mutations present in the sequence, and full length contigs from nanopore sequencing allowed the isotype of each antibody to be found.

Martin went on to explain that the applications of the RAGE-seq technique in immunotherapy were both impressive and important for antibodies and antibody therapies, as well as chimeric T cell engineering. Zooming in on T-cell clustering plots allows identification of different subtypes, and this information can be used to track clonal expansion – where all cells that have the same chains begin to spread. Individual lymphocytes clones can be effectively tracked, and Martin presented overlaid data from different tissue types, demonstrating how expression patterns on clonal populations – particularly those with tumour killing properties – can be examined effectively.

This kind of analysis, Martin noted, is also possible on other long read sequencing platforms, but the cost to do this is approximately double that of nanopore sequencing with similar rates of recovery. Concluding, Martin looked to future steps of this research, explaining that the team intend to further develop RAGE-seq ideally to get rid of the use of short read data altogether and generate enough long read data, perhaps with PromethION, to expression profile with long reads. Further than that, removing the capture step and instead looking at the total sequencing output might be another simplifying advance.