Fast and accurate assembly of Nanopore reads via progressive error correction and adaptive read selection


Although long Nanopore reads are advantageous in de novo genome assembly, applying Nanopore reads in genomic studies is still hindered by their complex errors. Here, we developed NECAT, an error correction and de novo assembly tool designed to overcome complex errors in Nanopore reads. We proposed an adaptive read selection and two-step progressive method to quickly correct Nanopore reads to high accuracy. We introduced a two-stage assembler to utilize the full length of Nanopore reads. NECAT achieves superior performance in both error correction and de novo assembly of Nanopore reads. NECAT requires only 7,225 CPU hours to assemble a 35X coverage human genome and achieves a 2.28-fold improvement in NG50. Furthermore, our assembly of the human WERI cell line showed an NG50 of 29 Mbp. The high-quality assembly of Nanopore reads can significantly reduce false positives in structural variation detection.

All source codes for NECAT are available from https://github.com/xiaochuanle/NECAT.

Authors: Ying Chen, Fan Nie, Shang-Qian Xie, Ying-Feng Zheng, Thomas Bray, Qi Dai, Yao-Xin Wang, Zhi-Jian Huang, De-Peng Wang, Li-Juan He, Feng Luo, Jian-Xin Wang, Yi-Zhi Liu, Chuan-Le Xiao