Main menu

De novo clustering of long-read transcriptome data using a greedy, quality value-based algorithm


Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin.

To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates).

We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

Authors: Kristoffer Sahlin, Paul Medvedev

Getting started

Buy a MinION starter pack Nanopore store Sequencing service providers Channel partners

Nanopore technology

Subscribe to Nanopore updates Resources and publications What is the Nanopore Community

About Oxford Nanopore

News Company timeline Sustainability Leadership team Media resources & contacts For investors For partners Working at Oxford Nanopore Current vacancies Commercial information BSI 27001 accreditationBSI 90001 accreditationBSI mark of trust
English flag