Kevin Lebrigand - Single cell isoform profiling, 10xGenomics scRNA-seq and nanopore long read sequencing
London Calling 2019
Single cell transcriptome sequencing has become a powerful tool for high-resolution analysis of gene expression in individual cells. However, current high throughput approaches only allow sequencing of one extremity of the transcript (transcriptome profiling). Information crucial for an in-depth understanding of cell-to-cell heterogeneity on splicing, chimeric transcripts and sequence diversity (SNPs, RNA editing, imprinting) is lost. Here we present an approach that uses Oxford Nanopore sequencing with unique molecular identifiers to obtain error corrected full length single cell sequence information with the 10xGenomics single cell isolation system and apply it to examine differential RNA alternative splicing and RNA editing events in the embryonic mouse brain.
Kevin opened his talk by outlining the 10X Genomics method of single-cell RNA sequencing using a droplet-based approach, whereby single cells are barcoded within microfluidically isolated droplets. Within the droplets, 10X barcoded gel beads are mixed with cells, enzyme and oil to create single cell GEMs (gel bead in emulsion).
GEMs are recovered using reverse transcription and cDNA amplification, followed by QC and quantification. Each amplified barcoded read contains an individual cell barcode, a UMI, and a poly(dT)VN sequence.
The library can then be prepared, normalised, and sequenced using short-read technologies. Gene counts are then displayed in tSNE plots which segregate the individual cell populations based on two principal components.
Short-read single-cell RNA sequencing yields reads which are close to the 3’ end of the cDNA molecule. This means that information regarding splicing, fusions, SNPs, editing, and imprinting are all lost. In comparison, long-read single-cell RNA sequencing preserves this information.
Kevin compared options for long-read, full-length transcriptomics, stating that the two main challenges are getting enough reads to profile molecules (“we need ~50k reads per cell”) and high accuracy sequencing for cell barcode and UMI demultiplexing.
Kevin next addressed the first challenge – getting enough reads to profile a vast number of molecules. He described how sequencing on the PromethION resulted in the loss of reads post barcode demultiplexing and that there was a positive correlation between Q score of the reads and the number of reads retained.
This led onto the second challenge, that being cell barcode and UMI identification – he explained how cell barcodes (16 nt long) are randomly selected from a pool of 750,000 barcodes, whereas UMIs (10 nt) are totally random sequences. Demultiplexing is challenging and error-prone when we don’t know the correct UMI and cell barcode sequences. He explained the computational challenges faced with the incredibly high number of possible combinations where for a 50 million read PromethION run, greater than 10^14 barcode variants need to be generated. Currently when 20 million polyA cDNA reads are generated on a PromethION run, approximately 1/3 can be reassigned to a specific cell.
Kevin next described an example of E18 mouse brain single-cell RNA sequencing. Here, 1,200 cells were split before reverse transcription; both sets were subjected to short-read and long-read nanopore sequencing, but one set underwent size selection and targeted sequencing. He displayed a tSNE plot of the data, which showed the segregation of different neural cell populations based on differential gene expression of single cells. The median number of UMIs per cell was 7,605. There was significantly high correlation between the short- and long-read data (r = 0.99), and 70% of short reads were identified in the nanopore long reads.
Discussing single cell sequencing in order to resolve isoforms and changes within specific isoforms, Kevin gave the example of the Clathrin Light Chain A (clta gene) between different cell types. The clta gene has two different isoforms, one with an extra exon and one without, and single cell analysis revealed that only one of these isoforms is expressed in the precursor cell, and the other in the mature cell only.
Kevin moved on to talk about using UMIs to identify molecules from the same parent molecule before PCR. Here, after 10X coverage from UMI clustered sequences, the consensus sequence identity of transcripts reached 98.9% and little extra gains could be made with > 10X coverage. Kevin said with consensus identity approaching 99% this is enough to call SNPs. Both a A -> SNP and a “flip – flop” exon (two mutually exclusive exon variants) were detected in the AMPA receptor Gria2.
Towards the end of his talk, while running over some of the numbers Kevin gave a rough ball park figure suggesting that approximately 1000 single cell transcriptomes can be done on a single PromethION flowcell.