The importance of structural variation in crop breeding
Brassica napus (oilseed rape) is a major oil crop worldwide, with widespread application in cooking, biofuel, and animal feed. The 1.2 Gb B. napus genome is allotetraploid, with one set of chromosomes from B. oleracea (e.g. cabbage; subgenome C) and another from B. rapa (e.g. turnip; subgenome A) revealing the organism’s evolutionary history.
The genome of B. napus displays extensive gene and chromosome-level structural variation (SV), which underlies important phenotypic traits, such as flowering time, disease resistance, and seed quality. Precise resolution of these SVs could support improvement of these economically important crops.
Short-read sequencing technology has been utilised to describe many SVs; however, due to the genome’s tetraploidy and the propensity of short sequencing reads to map to more than one location, resolution of these aberrations to the subgenome level is extremely challenging. In order to more accurately resolve SVs at the sub-genome level in B. napus, researchers at the Justus Liebig University utilised nanopore sequencing reads, which, due to their long lengths, are far less likely to multimap1.
The team sequenced four diverse B. napus lines taken from sites around the world, including North America (N99, spring flowering type), China (PAK85912, semi-winter flowering type), and Europe (Express 617 and R53, winter flowering types). To ensure accurate delineation of large SVs, the researchers implemented a size selection step using the Circulomics Short Read Eliminator XL Kit, which is designed to deplete DNA fragments less than 40 kb in length. The most recent runs, which also utilised a nuclease wash to maximise pore availability and enhance sequencing yield, delivered over 30 Gb of data on a single MinION Flow Cell, with a read N50 of approximately 40 kb. The resulting sequencing data were aligned to a reference using NGMLR prior to SV calling using the Sniffles algorithm.
‘[We] identified insertions, which were almost impossible to detect with small read length technologies’1
The team observed that the majority of SVs across all plant lines ranged from 100–1,000 bp in length, with lead researcher Harmeet Singh Chawla commenting that such SVs would be ‘almost impossible’ to detect using short-read sequencing technology (Figure 1a)1. It was also evident that larger SVs were detected in spring flowering genotypes (N99 and PAK85912) when compared to winter flowering genotypes (R53 and Express 617) (Figure 1b).
Interestingly, between 5-8% of genes were found to contain SVs, with lower SV diversity observed in the C-subgenome compared to the A-subgenome. According to Harmeet, this is likely to reflect the breeding history of the crop, as many traits have been artificially bred in the cabbage (C-subgenome) where the turnip (A-subgenome) remains relatively unaltered.
Figure 1: The majority of SVs detected across all B. napus lines were between 100 bp and 1,000 bp in length (a). Overall, the spring flowering lines N99 and PAK85912 contained larger SVs (b). Figure courtesy of Harmeet Singh Chawla, Justus Liebig University, Germany1.
‘There is much more than SNPs to explain the observable phenotype in oilseed rape’1
Examining genes known to be involved in geographical adaptation, the team observed a number of SVs, including a 90 bp insertion in BnVIN3 — a gene associated with flowering time. Interestingly, this insertion was found in only one of the two winter flowering lines, Express 617. SVs associated with disease resistance were also identified, including a 725 bp deletion in the 4-Coumarate:CoA ligase gene of the R53 line (Figure 2). According to Harmeet, this variant explains 20% of the Verticillium (a major fungal pathogen of B. napus) resistance phenotype1. This case study is taken from the Plant research white paper.
Figure 2: Long nanopore sequencing reads enabled the identification of a 725 bp deletion in the 4-Coumarate:CoA ligase gene, which was observed in the R53 but not the Express 617 winter flowering line. Figure courtesy of Harmeet Singh Chawla, Justus Liebig University, Germany1.
Researchers at Johns Hopkins University are also utilising nanopore sequencing to support comprehensive SV analysis of important crops2. Using the high-throughput PromethION platform, 100 diverse tomato genomes were sequenced in just 100 days, identifying between 25,000–45,000 SVs per genome, including an 83 kb tandem duplication that increases fruit yield. For more information on this project and the use of nanopore sequencing to accurately and rapidly characterise the SV landscape, download the Structural variation white paper.
- Chawla, H.S. Long reads reveal small-scale genome structural variations in allotetraploid canola. Presentation. Available at: https://nanoporetech.com/resource-centre [Accessed: 15 December 2019]
- Schatz, M. 100 genomes in 100 days: The structural variant landscape in tomato genomes. Presentation. Available at: https://nanoporetech.com/resource-centre/michael-schatz-100-genomes-100… [Accessed: 15 December 2019]