Efficient data structures for mobile de novo genome assembly by third-generation sequencing

Mobile/portable (third-generation) sequencing technologies, including Oxford Nanopore’s MinION and SmidgION, are revolutionizing once again –after the advent of high-throughput sequencing– biomedical sciences. They combine an increase in sequence length (up to hundred thousands of bases) with extreme portability. While a sequencer now fits the palm of a hand and needs only a USB outlet or a mobile phone/tablet to work, the data analysis phases are bound to an available Internet connection and cloud computing. This somehow hampers the portability paradigm, especially if the technology is used in resource-limited settings or remote areas with limited connectivity. In this work, we introduce efficient data structures to effectively enable portable data analytics by means of third-generation sequencing. Specifically, we show how sequence overlap graphs (fixed length k-mers, with an extension on variable lengths) can be built and stored on a mobile phone, thereby allowing the execution of de novo genome assembly algorithms (along with ad-hoc strategies for error correction) without the need of transfer data over the Internet nor execution on a desktop.

Authors: Franco Milicchioa, Mattia Prosperib