Mike Clark - Deep transcriptomic sampling with long-read single cell RNA sequencing
London Calling 2019
Single cell RNA-seq (scRNA-seq) is rapidly gaining favour for understanding biology. Long-read methods have great potential for scRNA-seq by facilitating the identification and quantification of gene isoforms, allowing cell specific variation in isoform expression and splicing to be characterised. Previous methods for transcriptome-wide long-read scRNA-seq have been limited by low numbers of cells and/or few reads per cell. We have developed an improved nanopore long-read scRNA-seq method that allows the profiling of hundreds of single-cells at high read-depth and demonstrate its use by profiling five human cancer cell lines. These results demonstrate the power of long-read sequencing to characterise gene and isoform expression in single cells.
Michael Clark, from the University of Melbourne, opened his talk by discussing the reason the people want to perform single cell sequencing. Michael said that when examining bulk samples, results are averaged across a population of cells, whereas single cell methods capture the biology of each cell in the population. Introducing the concept of scRNA-seq, that being the method to characterise gene expression in single cells, Michael said that it can be used for a variety of reasons, such as identifying cell types and expression profiles; identifying expression changes during development or disease; or to investigate how genetic variation regulates gene expression. With the advent of long read technology, scRNA-seq provides the opportunity to characterise isoforms and alternative splicing events at the single cell level and you “…really need long read methods to study and identify them”.
Michael then spoke about designing single cell RNA-based experiments. Previous long read scRNA-seq studies have used “small numbers of cells” however, powerful single cell analyses require large numbers of cells. Furthermore, in order to properly characterise isoforms, ideally a deep sequencing approach would be required. The question was then posed, “How do we scRNA isoform-seq from large numbers of cells?”.
Michael described how the 10X single cell platform separates and sorts cells in a microfluidic system. This encapsulates the cells, along with reagents and a cell barcode, in individual gel bead emulsions (GEMs) allowing cDNA to be synthesised. During this process unique cellular barcodes are added allowing for the cell of origin to be determined post sequencing. Michael said that to get the most out of these GEMS, they can be split and spread over different platforms or even stored for later use. Using five different cancer cell lines, Michael described an experimental set up where 10% of the GEM material was sequenced on three long read platforms, including the Oxford Nanopore MinION and PromethION and a short-read platform. Furthermore, samples were sequenced on a high-throughput, short read system using the standard protocols. In doing this the depth of sequencing per cell could be controlled and he estimated that material representing approximately 450 cells was used for sequencing from the sub-setted material. Michael also mentioned that performing experiments in this way allows you to get the most out of your libraries as kits enabling single cell sequencing can be expensive. In addition, combining data from samples split over short read sequencing and nanopore sequencing platforms could be used to aid in cellular barcode detection helping the unambiguous demultiplexing of sequences to a cell of origin.
Michael moved on to talk about scPipe, a bioinformatic pipeline used to analyse the resulting long read data. After Albacore basecalling cellular barcodes are detected in a 10 bp window post adapter trimming. Reads are then mapped to exomes, UMI demultiplexing takes place and a count matrix is generated. Michael gave a head-to-head comparison of the demultiplexing results and basic QC statistics after this pipeline had been used on the data from the different sequencing platforms. The short-read platform returned the most reads while the PromethION and MinION gave the highest numbers of reads of the long read platforms assessed by some margin. Cellular barcode detection was the best for the short-read technology, being around 91% of reads identified, while the PromethION showed the lowest with 30%. The MinION showed a higher level of cellular barcode detection than the PromethION, at 56%, and this was comparable to the other long read technology assessed. Of the average UMI counts per cell, short reads detected the most (42,084), with 70% of sequences mapping to an exon, while PromethION came in a close second (38,663) with 85 % of reads mapping to an exon. MinION (9,606) was next with 85 % of reads mapping to an exon while the final long read platform assessed came in last with 1,684 reads from 2 sequencing cells and 83 % of those reads mapped to an exome.
Examining the number of genes detected on each platform and how they correlated with each other, approximately the same or more were detected in the nanopore runs and the short-read run. Spearman’s correlations were “pretty good” between the short-read platform and both the MinION and PromethION, coming in at around 0.9. Michael said that the other long-read platform was not as good most likely due to the low read counts.
Next, Michael moved on to talk about how gene expression correlated through cellular UMI Counts. Again, the short read, MinION and PromethION platforms correlated well but Michael wondered “why are these [correlation] values not 1?” He proposed that a lot of this is due to variance in sensitivity. As this is single sequencing there are lots of cells sequenced but comparably few reads per cell. There also seemed to be some higher levels of expression in the long-read methods and it was unclear if this is a problem with the way short read counting is performed or long read counting.
T-SNE plots were then shown where each point represents a cell and the distance between each point in 2 dimensions represents how different their transcriptome profiles are. In the short read and nanopore runs, each of the 5 cancer cell lines clearly separated with consistent groupings across all those platforms. The final long read platform showed poor separation of the cell lines in this ordination plot and again, Michael stated this was due to low read counts.
Closing his talk, Michael spoke about differential isoform detection in the TMM17B gene where a skipped exon could be seen in 3 of the cell lines while it was present in 2.
What about isoform expression? Michael described how there aren’t many tools available for isoforms for long reads, and those that are available aren’t optimised for the low read counts that are typical in single cell sequencing experiments.
Summarising, Michael said that data processing by scPipe yields consistent results between Illumina and Nanopore for gene profiling and cell clustering; and isoform characterisation is an ongoing challenge, but isoforms and differentially expressed isoforms can be identified from single cells with Nanopore sequencing.