Cas9-mediated enriched coupled to nanopore sequencing provides a valuable tool for de novo assembly of large genomic target regions
The availability of well-assembled genomes is critical for accurate variant calling and the identification of candidate genes/variants responsible of a certain phenotype. However, one single reference genome may be not sufficient, especially in plants, where different cultivars can vary consistently not only at single-nucleotide level (SNV), but also at the structural level (SV), as recently demonstrated with the tomato pan-genome.
In turn, de-novo assembly of cultivar-specific reference genomes can be very costly. Long-DNA fragment capture in combination with long-read sequencing may provide a cost-efficient approach to accurately reconstruct genomic regions of interest for in depth analysis. As proof-of-concept, we have applied the Cas9-mediated enrichment coupled to nanopore sequencing to reconstruct a 250 Kb region on chromosome 5 of P.vulgaris genome.
The region presented a large amount of SNV and SV in the cultivar Midas, as compared to the reference genome. Five tiled sub-regions of 50Kb each were cut with high efficiency (>70%) from Midas genomic DNA, using target-specific guide RNAs designed on conserved coding regions. Sequencing on a MinION device yielded good amount of data (˜150X coverage and ˜130-fold enrichment, on average) that were assembled de-novo, generating a single contig spanning the whole 250Kb target region. The target region captured with Cas9 was properly reconstructed and shared a 99.5% identity with the one assembled using a traditional approach based on whole-genome-sequencing (nanopore data, 50X average coverage).
Finally, short read data derived from Midas-inbred lines showed a consistently improved mapping quality on the de-novo assembled locus, as compared to the P.vulgaris reference genome. In conclusion, the Cas9-mediated target enrichment “tiling” approach represents a valuable alternative to whole genome sequencing to assemble ultra-long target regions, with consistent cost-saving. In the future, this approach can allow the fine characterization of cultivar-specific regions of interest, especially in plants with very large genomes where whole genome de-novo assembly is little affordable."