Uncovering tandem repeats in long-read sequencing data reveals high satellite repeat turnover in great apes


We introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads and use it to investigate the satellite turnover in great apes, revealing a handful of abundant motifs forming two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Many satellite repeats were completely embedded within long Oxford Nanopore reads. Such repeats were up to 59 Kb and consisted of perfect repeats interspersed with similar sequences. Our results generated with three different technologies provide the detailed characterization of great ape satellite repeats.

Authors: Monika Cechova