From finding flavonoids to the ‘dark matter’ of T-DNA insertions — investigating plant genomes with nanopore sequencing
Wed 5th October 2022
Boas Pucker's research focuses on the evolution of specialised metabolites in plants and how sequencing plant genomes helps further our knowledge of plant genome diversity and variation within a species. For his bachelor thesis, Boas investigated flavonoid biosynthesis in sugar beet, and during his PhD, he compared different Arabidopsis thaliana genomes. Since then, Boas has been involved in many plant genome assembly projects using long sequencing reads. Boas is now a professor at the Technical University of Braunschweig, Germany, where he leads the Plant Biotechnology and Bioinformatics group.
During his time at the CeBiTec at Bielefeld University, Boas Pucker sequenced the genomes of several plants including yam, sugar beet, and grapevine. Now, he is working as a professor at TU Braunschweig on the specialised metabolism in plants. Plant genomics and applied bioinformatics are the basis for this research. Boas also teaches sequencing and plant genomics to enable the next generation of plant scientists.
The fascination of plants
Boas admits that he got into plant genomics ‘more or less by chance’ when he joined the Genome Research group at Bielefeld University in 2010 to learn more about genomics and sequencing. The group just happened to be working on plants, and Boas explains that he quickly became fascinated by them. Now, in 2022, Boas is still investigating plants, and recently co-wrote a review on early land plants and sequencing, particularly using long reads, discussing what has been achieved in this area and future trends1. Boas is particularly interested in sequencing within a species, to understand intraspecific variation. This will help reveal their biological function and may help our understanding of traits in crops that could make them more resilient to climate change.
'There are some big projects that are now ready to sequence...the genomes of all eukaryotic species, plants and animals...to look at variation within a species.'
Capturing variation, between and within species
Boas describes his work investigating flavonoids: a diverse group of secondary metabolites that are synthesised from phenylalanine in almost all land plants. Their core pathway is well understood and conserved across all plant species, but there are specific modifications that might be particular to just one species or a larger phylogenetic lineage. Boas wants to better understand how specific enzyme modifications affect flavonoid levels because they have potential biomedical value, with ‘numerous studies' showing their potential cancer-preventative effects. However, with 12,000 flavonoids reported to exist, it would be useful to study the potential effects of each. Boas believes the vast number of flavonoid metabolites ‘is probably mostly due to enzyme promiscuity...it’s very likely that there will be many plant species which have specific enzymes’, so, ‘by looking at more genome sequences, we can identify and later characterise such enzymes’. Because of such diversity, Boas feels ‘it’s definitely helpful to sequence more plant genomes... sequencing within a species will help to understand intraspecific variation’, and expects long reads will play a crucial role in the transition towards plant pangenomics.
Of course, plants synthesise metabolites for their own use; flavonoids can be involved in pigmentation, which may provide colour to attract pollinating insects, and to protect against various environmental challenges, such as preventing herbivore attacks or protecting against strong UV light. Optimising flavonoid content within a plant could help it withstand new environmental stresses and can be achieved by careful selection of cultivars and growth conditions. Other strategies include gene editing of globally important crops to add in-built protection against an environmental challenge, such as pest control, and could reduce the need for harmful chemicals being sprayed on to crops. Generating highly contiguous, chromosome-scale genome assemblies is fundamental to understanding and identifying the genes involved. When it comes to genomics, Boas feels ‘it’s definitely an advantage that it’s now possible to sequence the genome of the species of interest.....it’s helpful to generate a genome sequence and find all the genes involved.’
First encounters of a sequencing kind
Boas first encountered nanopore sequencing in 2017, whilst he was at Bielefeld University, supervising a team competing for iGEM, ‘the biggest competition for synthetic biology’. The team were working on expanding the genetic code — adding an additional nucleotide into the DNA. Boas describes how ‘the big challenge was...how to detect an unnatural base pair’ because most of the molecular biology techniques focus on using A, C, G, and T. The team found that nanopore sequencing could achieve this, since native, unamplified DNA (or RNA) molecules — which retain modification information — are sequenced directly. As the molecule traverses the nanopore, a disruption, characteristic for the specific base and modification, if present, can be detected and interpreted. Boas recalls that ‘this is how the nanopore sequencing started there’.
One of the first plant genomes Boas sequenced using nanopore technology was an Arabidopsis, which had been cultured in a flask, in the dark, in a sugar solution, for approximately 25 years. The idea was to compare plants incubated under artificial and natural conditions, in the hope of finding a minimum plant genome; however, they found the opposite. Most genes were present in multiple copies, so although the genome had accumulated ‘some quite severe mutations’ that rendered single genes functionless, the cells were still able to replicate and grow; Boas describes the genome not only as ‘complete chaos’ but also ‘a big surprise’.
T-DNA dark matter
Also surprising, were the discoveries Boas and his colleagues made while characterising T-DNA insertion lines2. Before CRISPR/Cas9 was used in genome editing, the function of a plant gene was determined using plant lines transformed with Agrobacterium tumefaciens, a soil bacterium that transfers DNA into plant cells and integrates at random positions. Over 700,000 Arabidopsis T-DNA insertion lines were constructed, and several collection centres were set up: Boas was involved with GABI-Kat, the second largest collection in the world. PCR was routinely used to characterise the lines; however, sometimes the confirmation PCR would fail, and Boas wanted to understand why. Based on information from other researchers using these lines, Boas feels it is a common problem and, because such lines cannot be characterised, researchers stop using them. Boas refers to it as the ‘dark matter of T-DNA lines’.
‘People can revisit their old T-DNA lines and find things that they didn’t even know were there.'
Boas and his team decided to fully characterise 14 T-DNA lines using whole-genome nanopore sequencing, as they felt it was ‘way more effective’ using long sequencing reads to cover the genome ‘than just doing PCR on a small piece of it’. Their results not only confirmed previous results but also identified novel events, such as chromosomal rearrangements, which did not involve T-DNA — ‘this was something new that [was] discovered, which happened quite frequently’. Genotyping by sequencing T-DNA lines could reveal interesting structural variations or multiple T-DNA insertions; Boas estimates that sequencing a T-DNA insertion line ‘costs around $200, and it’s probably getting more [cost] effective’.
Figure 1: Ideograms of the chromosomes of Arabidopsis T-DNA insertion line GK-038B07, displaying a reciprocal fusion of chromosomes 3 and 5, as well as a 2 Mbp inversion between two T-DNA arrays at the fusion sites. Adapted from Pucker et al. (2021)2.
Boas regularly submits his datasets to the European Nucleotide Archive, ensuring the wider scientific community can access these valuable resources. There is also the option to re-basecall older nanopore sequencing data, to make use of the latest algorithms offering higher basecalling accuracy.
The democratisation of sequencing
Boas believes that one of the most important technological developments for molecular plant sciences is ‘the drop in sequencing costs so [that] it’s possible to do large transcriptomic experiments — doing various RNA-Seq experiments [is] really important to characterise the function of genes’, particularly if you are studying transcription factors. Discussing de novo transcriptome assemblies, Boas highlights how the problem with previous genome sequencing projects was that the number of genes was inflated due to fragmentation. Boas explains that ‘it’s helpful to have long reads to do the annotation, so then you can solve the problem of having multiple parts of the gene annotated as separate genes...having long reads that are connecting all of the exons that are involved in one gene could be quite helpful to solve it’.
Image: high molecular-weight plant gDNA is extracted by Boas for nanopore sequencing.
Although affordable sequencing is critical, Boas feels that the democratisation of sequencing is more about ‘the availability of sequencers than the actual sequencing costs...[and that] with the MinION, it’s now possible that every lab can do sequencing’. Because of this accessibility, Boas is now able to offer a practical sequencing course for students at the Technical University of Braunschweig, where they can sequence a plant genome of their choice. Boas recalls how, back in 2014, a PhD student told him about a sequencer that could be plugged in via USB — ‘it was really exciting to see that it became possible’. And now? ‘I can do over a weekend what other people did in their entire career’.
Want to learn more?
Watch Boas’ talk, Effective characterization of T-DNA insertion lines through nanopore sequencing, at London Calling 2021
View our white paper, Closing the gaps in plant genomes
Find out about Flavonoid Friday with Boas Pucker
1. Pucker, B. et al. Plant genome sequence assembly in the era of long reads: progress, challenges and future directions. Quant Plant Biol 3(E5). DOI: doi:10.1017/qpb.2021.18 (2022).
2. Pucker, B., Kleinbölting, N. & Weisshaar, B. Large scale genomic rearrangements in selected Arabidopsis thaliana T-DNA lines are caused by T-DNA insertion mutagenesis. BMC Genomics 22(599) (2021). DOI: https://doi.org/10.1186/s12864-021-07877-8