Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal new biological insights

Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community still relies on a fragmented reference genome sequence from 2004. Incomplete reference sequences hamper experimental design and interpretation. We have generated a new C. parvum IOWA genome assembly supported by PacBio and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species C. parvum, C. hominis and C. tyzzeri.

The new C. parvum IOWA reference genome assembly is larger, gap free and lacks ambiguous bases. This chromosomal assembly recovers 13 of 16 possible telomeres and raises a new hypothesis for the remaining telomeres and associated subtelomeric regions. Comparative annotation revealed that most “missing” orthologs are found suggesting that species differences result primarily from structural rearrangements, gene copy number variation and SNVs in C. parvum, C. hominis and C. tyzzeri. We made >1,500 C. parvum annotation updates based on experimental evidence. They included new transporters, ncRNAs, introns and altered gene structures.

The new assembly and annotation revealed a complete DNA methylase Dnmt2 ortholog. 190 genes under positive selection including many new candidates were identified using the new assembly and annotation as reference. Finally, possible subtelomeric amplification and variation events in C. parvum are detected that reveal a new level of genome plasticity that will both inform and impact future research.

Authors: Rodrigo P. Baptista, Yiran Li, Adam Sateriale, Mandy J. Sanders, Karen L. Brooks, Alan Tracey, Brendan R. E. Ansell, Aaron R. Jex, , Garrett W. Cooper, Ethan D. Smith, Rui Xiao, Jennifer E. Dumaine, Matthew Berriman, Boris Striepen, James A. Cotton, Jessica C. Kissinger