ClairS-TO: a deep-learning method for long-read tumour-only somatic small variant calling


ClairS-TO is a deep learning-based tool for detecting somatic variants in tumour-only samples using long-read sequencing data. It combines dual neural networks with advanced filtering strategies to accurately distinguish somatic mutations from germline variants and sequencing noise — achieving state-of-the-art performance without requiring a matched normal sample.

Key points:

  • Tumour-only somatic variant detection is challenging because it lacks a matched normal sample, making it difficult to distinguish somatic mutations from germline variants and sequencing noise

  • Chen and Zheng et al. designed ClairS-TO specifically for long-read tumour-only variant calling, addressing the limitations of short-read-based tools

  • It uses an ensemble of two neural networks trained on opposing tasks to boost classification accuracy

  • ClairS-TO was trained on synthetic datasets derived from Genome in a Bottle HG002 and HG001 from EPI2ME labs and real tumour data from six cancer cell lines

  • It outperformed DeepSomatic and short-read callers (Mutect2, Octopus, Pisces) across Oxford Nanopore, PacBio, and Illumina datasets

  • Demonstrated robust performance across sequencing coverages, variant allele fractions, tumour purities, and complex genomic regions

Sample type: cancer cell lines and synthetic datasets

Authors: Lei Chen, Zhenxian Zheng, Junhao Su, Xian Yu, Angel On Ki Wong, Jingcheng Zhang, Yan-Lam Lee, Ruibang Luo