Ratatosk - Hybrid error correction of long reads enables accurate variant calling and assembly

Motivation
Long Read Sequencing (LRS) technologies are becoming essential to complement Short Read Sequencing (SRS) technologies for routine whole genome sequencing. LRS platforms produce DNA fragment reads, thousands to millions bases long, allowing the resolution of numerous uncertainties left by SRS reads for genome reconstruction and analysis. In particular, LRS characterizes long and complex structural variants undetected by SRS due to short read length. Furthermore, assemblies produced with LRS reads are considerably more contiguous than with SRS while spanning previously inaccessible telomeric and centromeric regions. However, a major challenge to LRS reads adoption is their much higher error rate than SRS of up to 15%, introducing obstacles in downstream analysis pipelines.

Results
We present Ratatosk, a new error correction method for erroneous long reads based on a compacted and colored de Bruijn graph built from accurate short reads. Short and long reads color paths in the graph while vertices are annotated with candidate Single Nucleotide Polymorphisms. Long reads are subsequently anchored to the graph using exact and inexact k-mer matches to find paths corresponding to corrected sequences.

We demonstrate that Ratatosk can reduce the raw error rate of Oxford Nanopore reads 6-fold on average with a median error rate as low as 0.28%. Ratatosk corrected data maintain nearly 99% accurate SNP calls and increase indel call accuracy by up to about 40% compared to the raw data. An assembly of the Ashkenazi individual HG002 created from Ratatosk corrected Oxford Nanopore reads yields a contig N50 of 43.22 Mbp and outperforms high quality LRS assemblies using PacBio HiFi reads.

In particular, the assembly of Ratatosk corrected reads contains about 2.5 times less errors than an assembly created from PacBio HiFi reads.

Availability: https://github.com/DecodeGenetics/Ratatosk.

Authors: Guillaume Holley, Doruk Beyter, Helga Ingimundardottir, Snædis Kristmundsdottir, Hannes Pétur Eggertsson, Bjarni Halldorsson