Nanopore sequencing resolves the complex, highly repetitive genome of Trypanosoma cruzi, the causative agent of Chagas disease

Chagas disease is a serious illness which can lead to life-threatening complications and death. It is caused by the parasite Trypanosoma cruzi, which is transmitted via the faeces and urine of the arthropod vector, Triatoma spp; also known as kissing bugs1. Chagas disease is endemic to 21 continental Latin American countries and it is thought to infect between six and seven million people worldwide1. Despite affecting so many people, it is still a poorly understood disease.


There is variability in the symptoms experienced between individuals, which may be partly due to a high level of genomic diversity in T. cruzi2. Unfortunately, there are few good-quality whole-genome sequences for T. cruzi because the parasite’s genome is difficult to resolve using short-read sequencing technologies. There are many reasons for this: the genome is highly repetitive and contains a high number of transposable elements (TEs), plus different strains have variability in genome size and number of chromosomes. Furthermore, spontaneous aneuploidies and polyploidies and lack of synteny, and hybridisation between parasites results in highly heterozygous genomes2. This is further complicated by the fact that the diversity between the strains is mostly found in six highly diverse multi-copy gene families (MGFs) throughout the genome. These regions are especially important to resolve as they may play a role in the pathogenicity of the parasites and their ability to evade the immune system.

'This is the first T. cruzi genome that has been assembled with ONT long reads alone, without supplementation of other technologies, and the quality of this genome is comparable to those of other high-quality genomes'

Using long nanopore reads, Hakim et al.2 demonstrated that it is possible to overcome these challenges. They developed a scalable nanopore sequencing pipeline for sequencing T. cruzi. They developed their pipeline using the Tulahuen strain, for which there is currently no publicly available whole-genome sequence, despite being a clinically important strain of the parasite and one of the most common strains found in areas where the burden of disease is high. They used the Ligation Sequencing Kit to prepare DNA libraries, which were sequenced across two flow cells, R9.4.1 and R10.4.1. Basecalling was performed in super accuracy mode with Guppy, and duplex reads were called on the R10.4.1 Flow Cell; data from both runs were combined. Duplex reads are those in which a template and complement strand of a single molecule of DNA are sequenced in succession, in order to enable nanopore devices to sequence a template and complement strand of a single molecule of DNA, in succession, and achieve very high accuracy sequencing results.

The authors discovered that the Tulahuen strain of T. cruzi possesses a highly heterozygous genome. A genome assembly was generated using the program NextDenovo3. There were 75 contigs, of which 12 were found to possess telomeric repeats, and one had telomeric repeats at both ends, indicating assembly of a full chromosome.

In other eukaryotic parasites, such as T. brucei or Plasmodium falciparum, the highly variable gene families are primarily localised to the sub-telomere. Diversification in the genomes is driven by mitotic recombination at the sub-telomere, which is a hotspot for such activity. However, this is not the case for T. cruzi, where the variable gene families are more dispersed throughout the genome. Therefore, T. cruzi must have another method for diversification. The authors discovered that a very large proportion of the T. cruzi is composed of repetitive regions, 27% of which are TEs; both RNA and DNA TEs were found in the genome. These repetitive regions are often difficult to resolve during assembly, but long nanopore reads circumvent this challenge. The authors hypothesised that the TEs may be involved in genome diversification in T. cruzi; a theory that could be tested by examining the distances between TEs and coding sequences, which ought to be close together if TEs are involved in homologous recombination. They discovered that TEs are closer to multi-gene family coding sequences than coding sequences from other genes (figure 1), supporting their hypothesis that TE-mediated genome diversification is occurring.


The authors concluded that ‘This work demonstrates the feasibility of using nanopore sequencing alone, with relatively little sequencing data to study important genomic features such as multi-gene family members and transposable elements in a hybrid strain’ and that ‘nanopore sequencing is an excellent tool for generating whole genomes from a large number of strains, especially in low-resource settings’.

  1. World Health Organization (06 April 2023). Chagas disease (also known as American trypanosomiasis. Available at: https://www.who.int/news-room/fact-sheets/detail/chagas-disease-(american-trypanosomiasis). Date accessed: 31 July 2023.
  2. Hakim, J.M.C., Guarnizo, S.A.G., Machca, E.M., Gilman, R.H. and Mugnier, M.R. (2023) Whole genome assembly of a hybrid Trypanosoma cruzi strain assembled with nanopore sequencing alone. BioRxiv. DOI: 10.1101/2023.07.27.550875.
  3. GitHub. NextDenovo. Available at: https://github.com/Nextomics/NextDenovo. Accessed: 16 August 2023.