Nanopore sequencing in single-cell and spatial transcriptomics

Rainer Waldmann (IPMC – INSERM, CNRS, Université Côte d’Azur, France Genomique) explained how UMIs (‘Unique Molecular Identifiers’) are very important for single-cell sequencing: by tagging each cDNA molecule with a unique ID, amplification bias in gene expression can be eliminated; furthermore, using UMIs to generate consensus sequences for single RNA molecules helps with error correction.

Rainer explained how both the requirement of fragmentation for short-read sequencing, and the retrieval of only terminal sequence information in the sequencing process, mean you lose much of the information on splicing and single nucleotide variants (SNVs): ‘There is actually a pretty easy solution – just skip the fragmentation and do nanopore sequencing’. The challenging part is the downstream identification and assignment of cell barcodes and UMIs. His team use short-read guided assignment, whereby they extract the barcode/UMI combinations from short-read data, and then search for the matching combination in nanopore reads for the same gene or region. Their approach obtains barcode assignment accuracy of 99.8%, and UMI assignment accuracy of 97.4%.

Displaying tSNE plots of single-cell sequencing data from 1,141 E18 mouse brain cells, Rainer highlighted how nanopore sequencing could identify differential isoform switching in the Myl6 gene, whilst short-read data only revealed Myl6 expression at gene level. With their nanopore data, they could also identify SNVs throughout the isoforms, such as SNVs in the ionotropic glutamate receptor Gria2, which are associated with RNA editing.

Spatial transcriptomics

Introducing the next part of his talk, Rainer explained that you don’t just want to know which cell a transcript is expressed in, but also which part of a tissue – this is where spatial transcriptomics research comes in. To perform Spatial Isoform Transcriptomics (SiT), his team combined the 10x Genomics Visium system with nanopore sequencing; this involves in situ cDNA synthesis on slides with spatially barcoded reverse transcription primers annealed. The barcoding process is similar to single-cell transcriptomics – meaning that they could use the same barcode/UMI assignment strategy. Rainer displayed a result of such an experiment, where regional isoform switches of the myelin proteolipid protein Plp1 gene could be seen in the mouse olfactory bulb. He also showed the much greater transcript coverage nanopore provided compared to short reads. From this full-length isoform information, they also identified SNVs.

Addressing the question of ‘how many needs do we need?’ Rainer stated that you only need around 30 million reads for isoform discovery in the mouse olfactory bulb, as the tissue is very small. For SNV discovery, a higher read count of 50-70 million reads would be desirable. For coronal brain sections, which are much bigger, he recommended ~100 million reads for isoform discovery, and closer to 200 million reads for SNV calling.

Single-cell and spatial transcriptomics – skipping the short reads

Rainer introduced the last section of his talk by explaining how the improvements in nanopore sequencing over the past few years have meant that it is ‘now easily possible to assign barcodes by clustering’ them.

Displaying a comparison of nanopore sequencing to short-read sequencing of 3,000 single cells from a nasal airway sample, Rainer pointed out the high overlap, with 2,800 barcodes in common. Those not in common between datasets were from cells that had very few reads or UMIs and so were close to the cut-off for inclusion.

Rainer stated that the main challenge is UMI assignment: ensuring that related UMIs are grouped, whilst minimising grouping of non-related UMIs. He outlined a strategy for this, which involves taking the genome-matched, barcode-assigned nanopore reads, and clustering UMIs for the same cell and the same genomic region. He presented a graph showing how this approach achieves a strong positive correlation (R=0.995) in UMIs/cell between nanopore and short-read data. For gene expression, correlation was lower (median R=0.89), which was expected due to drop-outs which are common in single-cell expression data.

Regarding how many reads are needed for nanopore single-cell transcriptome sequencing, Rainer recommended that 1-2 PromethION Flow Cells should be sufficient for transcriptome profiling for 1,000-2,000 cells (at <10,000 UMIs/cell). For high accuracy SNV detection, 4-6 PromethION Flow Cells would be ideal, although another approach would be to perform shallow transcriptome sequencing, followed by targeted SNV detection using MinION.

Conclusions

Rainer concluded that both accurate single-cell and spatial transcriptomics are feasible with nanopore sequencing, and ‘I think [short-read] sequencing is meanwhile optional’.

Authors: Rainer Waldmann