Benchmarking small variant detection with ONT reveals high performance in challenging regions

The development of long read sequencing (LRS) has led to greater access to the human genome. LRS produces long read lengths at the cost of high error rates and has shown to be more useful in calling structural variants than short read sequencing (SRS) data.

In this paper we evaluate how to use LRS data from Oxford Nanopore Technologies (ONT) to call small variants in regions in- and outside the reach of SRS.

Calling single nucleotide polymorphisms (SNPs) with ONT data has comparable accuracy to Illumina when evaluating against the Genome in a Bottle truth set v4.2. In the major histocompatibility complex (MHC) and regions where mapping short reads is difficult, the F-measure of ONT calls exceeds those of short reads by 2-4% when sequence coverage is 20X or greater.

We develop recommendations for how to perform small variant calling with LRS data and improve current approaches to the difficult regions by re-genotyping variants to increase the F-measure from 97.24% to 98.78%.

Furthermore, we show how LRS can call variants in genomic regions inaccessible to SRS, including medically relevant genes such as STRC and CFC1B.

Although small variant calling in LRS data is still immature, current methods are clearly useful in difficult and inaccessible regions of the genome, enabling variant calling in medically relevant genes not accessible to SRS.

Authors: Peter L. Møller, Guillaume Holley, Doruk Beyter, Mette Nyegaard, Bjarni V. Halldórsson