TandemMapper and TandemQUAST: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats


Extra-long tandem repeats (ETRs) are widespread in eukaryotic genomes and play an important role in fundamental cellular processes, such as chromosome segregation. Although emerging long-read technologies have enabled ETR assemblies, the accuracy of such assemblies is difficult to evaluate since there is no standard tool for their quality assessment. Moreover, since the mapping of long error-prone reads to ETR remains an open problem, it is not clear how to polish draft ETR assemblies.

To address these problems, we developed the tandemMapper tool for mapping reads to ETRs and the tandemQUAST tool for polishing ETR assemblies and their quality assessment.

We demonstrate that tandemQUAST not only reveals errors in and evaluates ETR assemblies, but also improves them. To illustrate how tandemMapper and tandemQUAST work, we apply them to recently generated assemblies of human centromeres.

Authors: Alla Mikheenko, Andrey V. Bzikadze, Alexey Gurevich, Karen H. Miga, Pavel A. Pevzner