Pioneering single-cell sequencing approaches to understand the 'dark matter' of our genome

'During my postdoc research, I had already developed the first single-cell transcriptome sequencing technology in the world.'

From pioneering postdoc to principal investigator in single-cell sequencing research

What are your current research focuses?

My lab has two major interests: firstly, developing new single-cell omics sequencing technologies for genomics studies; secondly, using these cutting-edge technologies to study the epigenetic regulation of human germline cell development as well as the tumorigenesis process.

(1) Developing new single-cell omics sequencing technologies for genomics studies:

For this, we are trying to develop methods with single-molecule (long-read), single-cell multi-omics sequencing technologies. There is still a huge amount of ‘dark matter’ in different layers of our ‘omics’. For example, it is well known that there are about 20,000 protein coding genes in the human genome. Nearly all single-cell transcriptome studies treat one gene in a cell as a functional unit. However, we know that these 20,000 protein-coding genes can generate 170,000 different RNA isoforms, and from them generate 70,000 different proteins. That means, on average, one protein coding gene can generate 8-9 different RNA isoforms and 3-4 different proteins. Different proteins from the same gene can have different or even opposite biological functions. For example, the BCL-X (BCL2L1) gene can generate two different proteins: BCL-X(L) and BCL-X(S). BCL-X(L) represses apoptosis whereas BCL-X(S) promotes apoptosis. So, it is unhelpful to simply know if the BCL-X gene is expressed in an individual cell or not; you need to know if the RNA isoform for BCL-X(L) or BCL-X(S), or maybe both, is expressed in an individual cell or not. More importantly, it is estimated that there are many more currently unknown (novel) RNA isoforms from these 20,000 protein coding genes in our genome. Only single-molecule (long-read), single-cell transcriptome sequencing technologies can systematically resolve these complex issues of a cell. So, my lab is trying to systematically develop single-molecule, single-cell sequencing technologies for transcriptome, genome, epigenome (chromatin accessibility, 3D genome structure, etc), multi-omics, etc, analyses.

(2) Using these cutting-edge technologies to study epigenetic regulation:

Germline cells are crucial for transmitting genetic information from generation to generation and keeping a species stable for millions of years. However, at many stages during human embryonic development, germline cells are rare and difficult to access. More importantly, the germ cells are nearly always mixed with other types of cells in human embryos. Even in mouse models, it is extremely difficult to get millions of pure (the same cell type at the same developmental stage) germ cells for bulk sequencing studies. Only single-cell omics methods can universally analyse their gene regulation features thoroughly. During the past ten years, with the help of single-cell sequencing technologies, we have made tremendous progress in understanding the epigenetic regulation of human germ cell development.

How did your scientific research start and what led you to these current research focuses?

When I did my postdoc research in Azim Surani's lab, I worked on germline cell development using mouse models. After I set up my own lab, I thought it was natural to move one step further: to directly study the epigenetic regulation of human germline cell development. During my postdoc research, I had already developed the first single-cell transcriptome sequencing technology in the world (2009). It felt natural that in my own lab I tried to develop other single-cell omics sequencing technologies to facilitate developmental biology studies.

Published research

Can you briefly explain what the techniques scNanoHi-C and scNanoATAC-seq are used to investigate?

The technique scNanoATAC-seq can be used to simultaneously analyse chromatin accessibility and genetic changes (especially structural variations) in an individual cell, especially at complex genomic regions. scNanoHi-C can be used to analyse haplotype-resolved 3D genome structures in an individual diploid cell (the majority of cells in the human body are diploid). Notably, it can routinely identify higher-order (multi-way) chromatin interactions of an individual cell. That is, multiple enhancers simultaneously binding to the same promoter to promote its transcriptional activity, or one enhancer simultaneously binding to multiple promoters to promote their transcriptional activities.

Why did you decide to use nanopore technology in this research?

Long reads are essential to analyse higher-order chromatin interactions within an individual cell. Since these methods are based on amplification of genomic DNA fragments within an individual cell, the methylation information is lost in our scNanoATAC-seq and scNanoHi-C methods.

We have since been trying to develop a single-molecule (long-read) based DNA methylome sequencing technology.

Read the team's latest publication in September 2023, in Cell Research, demonstrating a nanopore sequencing-based method (scNanoCOOL-seq) for combined analysis of genome (copy number variation), methylome, chromatin accessibility, and transcriptome, in the same individual cell.

Can you explain why phasing is important in this research?

It is estimated that about 10-30% of the chromatin interactions are in trans. That is, between DNA fragments from different chromosomes. However, the two homologous chromosomes in a cell usually locate at different positions (different chromosome territories) in the nucleus and have different neighbouring chromosomes and different trans- chromatin interactions. That is, the trans- chromatin interactions are usually allele specific. So, you have to phase the genome before you can identify the trans- chromatin interactions (allele specific chromatin interactions).

How might looking at the 3D genome in single cells impact scientific research in areas such as cancer?

It is well known that 3D genome structure changes drastically during tumorigenesis and is very likely to contribute to the tumorigenesis process. However, tumour tissues are always a mixture of many different types of cell. Even just the cancer cells in tumour tissues can have different genetic clones and subpopulations. So, it is essential to use single-cell 3D genome structure analysis methods to analyse the chromatin interactions in the tumorigenesis process.

In your recent paper on scNanoHi-C, it was suggested that extrachromosomal DNA (ecDNA) affects the 3D genome and maybe oncogene expression, were you surprised by these results?

We are not surprised but excited by these findings. Now we have a concrete way to study the looping and interaction of the enhancers and promoters within an ecDNA molecule. We can also confidently study the interaction of the enhancers in an ecDNA molecule with the promoters within a linear chromosome, especially the higher-order complex interactions. We will definitely investigate them further.

Research highlights and challenges

'We believe that the era of third generation, single-cell omics sequencing technologies is coming.'

Do you have any highlights or successes that stand out in your research?

We believe that the era of third generation, single-cell omics sequencing technologies is coming. It will help us to resolve many mysteries of the ‘dark matter’ in different layers of our omics. For example, there are about 14,000 pseudogenes in the human genome and many of them are transcribed in the cells, and the transcripts from many of these pseudogenes are functionally important for the cells. However, short-read platform-based single-cell transcriptome sequencing technologies in general cannot accurately identify the transcripts from the pseudogenes. Third generation single-cell transcriptome sequencing technologies can reliably resolve this issue.

What have been the main challenges in this research and how have you approached them?

There are two major hurdles when developing single-cell, long-read omics sequencing approaches: (1) how to amplify long DNA fragments evenly and efficiently. Most of the previous amplification methods are suitable for short DNA fragments, but not for long DNA fragments. (2) how to computationally handle the long reads with higher sequencing errors (1-10% errors instead of 0.1% errors).

Future research

'We believe that single-molecule sequencing technologies will revolutionise the single-cell omics field within the next 3 to 5 years and change the genomics field profoundly.'

What are you most excited about in your future research?

We are most excited by the possibility that within the next several years we will be able to see the ‘dark matter’ in our transcriptome, genome, and epigenome, in every individual cell in our body, and understand how each contributes to the gene regulation network in human biology.

Tang F. Barbacioru C, Wang Y, ... Lao K*, Azim Surani M*. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 6:377-382 (2009). (*Co-corresponding authors).

Fan X, Tang D, Liao Y, ... Wang Y*, Tang F*. Single-cell RNA-seq analysis of mouse preimplantation embryos by third-generation sequencing. PLoS Biology. 18:e3001017 (2020). (*Co-corresponding authors).

Fan X, Yang C, Li W, ... Tang F*. SMOOTH-seq: single-cell genome sequencing of human cells on a third-generation sequencing platform. Genome Biology. 22:195 (2021). (*Corresponding author).

Hu Y, Jiang Z, Chen K, ... Tang F*. scNanoATAC-seq: a long-read single-cell ATAC sequencing method to detect chromatin accessibility and genetic variants simultaneously within an individual cell. Cell Research. 33:83-86 (2022). (*Corresponding author).

Xie H, Li W, Hu Y, ... Tang F*. De novo assembly of human genome at single-cell levels. Nucleic Acids Research. 50:7479-7492 (2022). (*Corresponding author).

Xie H, Li W, Guo Y, ... Tang F*. Long-read-based single sperm genome sequencing for chromosome-wide haplotype phasing of both SNPs and SVs. Nucleic Acids Research. 51:8020-8034 (2023). (*Corresponding author).

Liao Y, Liu Z, Zhang Y, ... Tang F*. High-throughput and high-sensitivity full-length single-cell RNA-seq analysis on third-generation sequencing platform. Cell Discovery. 9:5 (2023). (*Corresponding author).

Li W, Lu J, Lu P, ... Tang F*. scNanoHi-C: a single-cell long-read concatemer sequencing method to reveal high-order chromatin structures within individual cells. Nat Methods (2023). (*Corresponding author).

Lin J, Xue X, Wang Y, ... Tang F*. scNanoCOOL-seq: a long-read single-cell sequencing method for multi-omics profiling within individual cells. Cell Research (2023). (*Corresponding author).