Benchmarking data: what it is, why it matters, and how to use it

blog.return_to_blog

If you’re a researcher, you’ll know that benchmarking data is an important tool, helping you identify what approach might be best suited to your research question or where unwanted variation could be impacting your results.

To learn more about how benchmarking data is produced and why Oxford Nanopore Technologies is always striving to create more useful datasets, we spoke to Sean McKenzie, Director of Applications Bioinformatics at Oxford Nanopore. In this Q&A, he explains what benchmarking means in a sequencing context, why it matters, and how researchers can make the most of benchmarking data.

Let’s start with the basics — what is benchmarking, and what is benchmarking data?

People tend to think about benchmarking in two very different ways. In more computational contexts, benchmarking often means measuring how fast a programme runs or how much computing resource it needs. That’s not what my team does.

For us, benchmarking is about understanding how well different experimental and analysis approaches answer the biological questions you care about. That might mean comparing sample types, wet lab methods, or data analysis tools. Ultimately, the goal is to compare different ways of running an experiment and determine which will give the most accurate, reliable, and fit‑for‑purpose results.

This explainer animation illustrates how nanopore sequencing works.

What is benchmarking used for?

It serves two key purposes. First, it helps identify best practices: the methodologies that are most appropriate for answering a given question. Second, it helps define the limits of those methods. Even when you’ve followed all the best practices, benchmarking helps you understand how accurate your results are likely to be, what level of detail you can expect, and what kinds of information you can realistically extract from your data.

How does Oxford Nanopore generate benchmarking data?

One of the hardest parts of our work is choosing the right samples, which are ideally where we know the ‘truth’ of the sample. That’s not always straightforward. Many nanopore applications are tackling biological questions that haven’t been fully answered before, so there may be no established truth set to compare against.

In those cases, we try to approach the question from multiple angles, using different technologies and methods, to build a high‑confidence reference dataset that we can benchmark against.

A good example is metagenomic assembly. Here, you sequence a community of microorganisms and try to reconstruct the genomes of all the species present. Traditionally, the gold standard for benchmarking used simple mock communities made from around 20 known species. With long‑read sequencing, however, this quickly becomes too easy; we can uncover thousands of species in a sample.

In this interview from London Calling 2024, Sarah Buddle discusses her experiences in infectious disease research using EPI2ME for metagenomic data.

To address this, we use a standardised real‑world faecal reference sample from Zymo Research. The sample is homogenised and aliquoted so it can be analysed repeatedly. We combine results from different sequencing platforms and analysis approaches to build a picture of what microbes are present and what their genomes should look like, then assess how well different tools recover that information.

What are some of the most challenging benchmarking scenarios?

The really hard cases tend to involve rare events. For example, very complex genomic rearrangements associated with human disease, such as chromothripsis, are extremely uncommon. There are no publicly available standard samples with these characteristics, even though we know they occur in real patient data.

In situations like this, we may never have a true physical reference sample. Instead, we have to simulate data in silico, introducing realistic rearrangements into existing read sets and carefully designing them to reflect genuine genomic contexts. Even then, making those simulations representative and biologically meaningful is a major challenge.

Are benchmarking needs different in regulated and clinical use cases, compared with research?

The biggest shift is the emphasis on reproducibility and continuous testing. In research settings, we often benchmark a system once and assume performance will remain stable if nothing major changes.

In regulated environments, even small changes, such as different component batches, minor hardware updates, or manufacturing variations, require rigorous regression testing to confirm they don’t affect performance. While my team doesn’t run all of that routine testing, we help other teams within Oxford Nanopore by defining appropriate analyses and benchmarking frameworks to support those workflows.

How can customers access Oxford Nanopore benchmarking data?

We typically make benchmarking datasets available through our EPI2ME website and open data repositories. Alongside the raw data, we publish blog posts or release notes explaining what the dataset is, how it was generated, and how we used it for benchmarking.

A common example is variant calling accuracy. We use well‑established reference samples, such as Genome in a Bottle datasets, along with community‑accepted truth sets and comparison methods. We link directly to the data we use, provide example commands, and explain how customers can reproduce our results themselves.

Introducing EPI2ME, the software that enables data analysis for all levels of expertise.

What types of benchmarking metrics are available?

Variant calling covers several areas, including small variants like SNPs and indels, as well as structural variants, each with their own truth sets and comparison tools.

Beyond human genomics, we’ve published benchmarking data for microbial applications, such as our microbial amplicon barcoding workflows. For the release of the Microbial Amplicon Barcoding Kit, for example, we made open datasets available so users can assess both the sequencing data and associated analysis workflows using well‑defined microbial standards.

We also host benchmarking data related to read‑level accuracy, including canonical and modified bases. In this case, the research department created highly engineered synthetic DNA molecules with modified bases at known positions. These allow users to accurately assess modified base calling performance, something that’s very difficult to do with enzymatically-treated DNA alone.

The full list of datasets we have created as part of our Open Data project can be found on the EPI2ME website.

Finally, what’s the one thing early‑career researchers should take away about benchmarking?

Benchmarking is hard, but you don’t have to do it alone. Poorly designed benchmarks can be misleading, especially if not enough thought has gone into sample choice and truth sets.

If you’re working with nanopore data, we have strong support networks, including field application scientists and bioinformatics specialists, who can help. If needed, complex questions can escalate to my team. So rather than reinventing the wheel, don’t hesitate to reach out, whether that’s for guidance on experimental design or to access existing datasets.

Still confusing your Youden’s J and your F1-score? Dive deeper into the world of benchmarking with this blog from the EPI2ME team where they define key terms and walk you through some examples.

Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.

耗材

所有产品

研究领域

技术

技术

资源中心

文档

Nanopore学习中心

公司

新闻与活动

全球合作伙伴