21st April 2017 - Plant Cell
Recent updates in sequencing technology have made it possible to obtain Gigabases of sequence data from one single flowcell. Prior to this update, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing dataset with a median fragment size of 11,979 bp for the wild tomato species Solanum pennellii featuring an estimated genome size of ca 1.0 to 1.1 Gbases. We describe its genome assembly to a contig N50 of 2.5 MB using a pipeline comprising a Canu pre-processing and a subsequent assembly using SMARTdenovo. We show that the obtained nanopore based de novo genome reconstruction is structurally highly similar to that of the reference S. pennellii LA7165 genome but has a high error rate caused mostly by deletions in homopolymers. After polishing the assembly with Illumina short read data we obtained an error rate of <0.02% when assessed versus the same Illumina data. More importantly however we obtained a gene completeness of 96.53% which even slightly surpasses that of the reference S. pennellii genome. Taken together our data indicate such long read sequencing data can be used to affordably sequence and assemble Gbase sized diploid plant genomes. Raw data is available at http://www.plabipd.de/portal/solanum-pennellii and has been deposited as PRJEB19787.