Advances in arachnid genomics using Oxford Nanopore's MinION - Sarah Stellwagen

To finish up the Mini Theatre after two days full of scintillating talks throughout NCM, Sarah Stellwagen took to the stage. Sarah is currently at the University of Maryland, researching harvestman genomics and presented research she and her team had done into two projects: spider silk genetics and harvestman genomics, which are a particular species of arachnids with 'weird life history'.

Sarah first talked about the spider silk project. Most people typically think of spider silk as quite generic, but Sarah explained that there are actually multiple different kinds, with at least seven different types that can make up a web, with each having unique properties. The genes that encode the spider proteins, or spidroins, are typically very large and highly repetitive, which has previously made them a challenge to sequence. Sarah has collated all of the complete full-length spidroins which have been sequenced, and found that, despite spider silk being researched for decades as a potential biomaterial, there are only 19 full-length sequences. The length of these sequences ranges from 5 kb to 19 kb. Sarah aimed to sequence the genes that encode for aggregate spider glue protein, which have different uses depending on the spider. She gave some examples of the bolas spider which produces a single line of silk with a ball of glue at the end that they swing around to capture their prey. This all seems far more menacing than covering a web in glue droplets! Sarah and her team set out to try and sequence the complete transcript of the AgSp1 gene, which is one of the genes encoding an aggregate spider glue protein. The first attempt was with the Direct RNA Sequencing Kit, but replacing the recommended reverse transcriptase with Maxima H minus to try and get the longest reads possible. The largest transcript obtained was an impressive 16.5 kb read, but the 5' end was still in the repetitive region. This surprised Sarah as it was estimated that the gene was around 10 kb. To try and get around this, Sarah then turned to the Ligation Sequencing Kit, sequencing the whole genome in order to maximise read length and increase the chance of covering the repetitive region. Sarah described this approach as 'pulling a needle from a haystack'. The DNA was put through the BluePippin to select for reads above 20 kb, and sequenced on 3 flow cells to get 4-5x coverage of the entire genome. The results came in as quite a shock, with the AgSp1 gene actually being 42,270 bp, not including a 6,000bp intron! This was more than double the previous longest complete spidroin sequence. Sarah also discovered a second gene, AgSp2, which was 20,526 bp, not including a 31 kb intron. Nothing like two discoveries for the price of one!

Sarah then moved onto looking at harvestman genomics. Harvestman are not spiders, but are arachnids, and are of interest due to being facultative parthenogens. This means they are able to reproduce sexual, and also reproduce asexually if they cannot find a mate. They are also an example of mixed ploidy in a species. Sarah had a couple of questions she wanted to answer with this project. Firstly, what the relationship is between sexuality and cytotypes is, and whether one cytotype uses one mode more than another? Secondly, if tandem cytonuclear evolution reinforces reproductive mode in cytotypes? Mitochondria and nuclear DNA have to interact, which is a strong driver of evolution, so Sarah wanted to investigate how that influences the reproductive mode used. Lastly, she wanted to look at what is necessary for genomes to evolve this way, and why it happens in plants but is uncommon in animals. Sarah and her colleagues collected samples of Leiobunum manubriatum from across Japan, resulting in half a gram of DNA. From sequencing this DNA, she obtained the first ever mitochondrial genome for Sclerosomatidae. This appears to show that cytonuclear interactions may reinforce reproductive mode, and that mitochondrial and nuclear DNA must work cooperatively. So far, Sarah has done ten sequencing runs of huntsman. The first four runs were below 4.1 Gb each, but by the tenth run, Sarah was getting 15.54 Gb per flowcell. She credited this jump in throughput to the nuclease flush protocol and advised multiple flushes and loads of sample. This has generated 70x coverage of the genome and the data is ready to go through their pipeline in an effort to find out more about the mysterious huntsman.

Sarah wanted to say a huge thank you to Dr. Rebecca Renberg who collaborated on the spider glue project, Dr Mercedes Burns, who collaborated on the huntsman project, and to Dr Nobuo Tsurusaki who was her collaborator in Japan whilst they were collecting specimens.