Main menu

Theory of local k-mer selection with applications to long-read alignment

  • shared.published_on: May 23 2021
  • shared.source: BioRxiv

Motivation Selecting a subset of k-mers in a string in a local manner is a common task in bioinformatics tools for speeding up computation. Arguably the most well-known and common method is the minimizer technique, which selects the ‘lowest-ordered’ k-mer in a sliding window. Recently, it has been shown that minimizers are a sub-optimal method for selecting subsets of k-mers when mutations are present. There is however a lack of understanding behind the theory of why certain methods perform well.

Results We first theoretically investigate the conservation metric for k-mer selection methods. We derive an exact expression for calculating the conservation of a k-mer selection method. This turns out to be tractable enough for us to prove closed-form expressions for a variety of methods, including (open and closed) syncmers, (α, b, n)-words, and an upper bound for minimizers. As a demonstration of our results, we modified the minimap2 read aligner to use a more optimal k-mer selection method and demonstrate that there is up to an 8.2% relative increase in number of mapped reads.

resources.authors: Jim Shaw, Yun William Yu

入门指南

购买 MinION 启动包 Nanopore 商城 测序服务提供商 全球代理商

联系我们

知识产权 Cookie 政策 企业报告 隐私政策 条件条款 Modern slavery policy 前瞻性陈述

关于 Oxford Nanopore

联系我们 领导团队 媒体资源和联系方式 投资者 在 Oxford Nanopore 工作 BSI 27001 accreditationBSI 90001 accreditationBSI mark of trust
Chinese flag