Main menu

*De novo* clustering of long-read transcriptome data using a greedy, quality value-based algorithm

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin.

To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates).

We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

Authors: Kristoffer Sahlin, Paul Medvedev

Getting started

Buy a MinION starter pack Nanopore store Sequencing service providers Channel partners

Quick links

Intellectual property Cookie policy Corporate reporting Privacy policy Terms, conditions and policies Accessibility

About Oxford Nanopore

Contact us News Media resources & contacts Investor centre Careers BSI 27001 accreditationBSI 90001 accreditationBSI mark of trust
English flag