For large-cohort dementia research, nanopore sequencing offers scalability and accessibility

Globally, there are more than 55 million people suffering from dementia, with about ten million more cases added each year1. Alzheimer’s disease is the most common type, accounting for more than half of all dementia cases.

There is a pressing need to characterise the underlying biology of Alzheimer’s disease and other forms of dementia, to potentially support efforts to discover and develop new therapies to treat, cure, or even prevent disease. For such complex conditions, large datasets are needed to produce helpful associations between phenotype and relevant genomic mechanisms.

Large-scale studies have been performed in the past using  genotyping microarrays, which do not provide single-base resolution, or short-read sequencing, an approach which has been limited by ambiguous read alignment in the challenging genomic regions associated with cognitive function. These regions can include segmental duplications, repeat expansions, and other large genomic elements that can confound analysis based solely on hard-to-map short reads. To overcome this challenge, researchers need a highly scalable sequencing technique that can produce reads long enough to span large genomic elements for their accurate characterisation — while being cost-effective for utilisation in large-cohort studies.

Now, scientists have successfully demonstrated the use of long nanopore reads as an accessible, scalable approach for sequencing hundreds or thousands of samples from diverse populations to support large-scale studies of dementia. At the National Institutes of Health, USA, researchers have launched a sequencing initiative using long nanopore reads to help unravel the biology behind Alzheimer’s disease, Lewy body dementia, and frontotemporal dementia. ‘We know that these diseases have a big genetic component’, said Kimberley Billingsley, a scientist at the NIH Intramural Center for Alzheimer’s and Related Dementias (CARD), in a presentation at the 2023 London Calling event.

The initiative aims to generate deep datasets of structural variants (SVs) and other variants associated with these forms of dementia through nanopore sequencing of 4,000 human brain clinical research samples. With long nanopore reads, the scientists believe they will be able to   identify SVs linked to dementia and resolve highly complex regions such as the HLA locus and the APOE gene, which has been linked to an increased risk of Alzheimer’s disease .

To pave the way for sequencing thousands of samples, the CARD team first optimised protocols for sample preparation and sequencing of high molecular-weight DNA, data analysis, and data storage and access to make the results available to the scientific community2,3. With these protocols in place, they are now sequencing about 200 clinical research samples per month on an Oxford Nanopore PromethION 48 device, generating about 30x genome coverage for each sample and achieving read length N50s of about 30 kb.

The CARD computational pipeline that was honed for long nanopore reads generates a harmonised VCF file containing small variants and SVs. Comparing the long nanopore reads with data from a short-read sequencing technology, Kimberley described how the nanopore data ‘has a reduced SNP error rate … especially in those low-mappability regions’. The F1 score for SNP detection — a combined measure of precision and recall for variants — was higher for nanopore data than for data from a short-read technology. For SV discovery, the nanopore F1 score was comparable to that of another long-read sequencing platform — but ‘with a lower cost and higher throughput’.

‘Nanopore has a reduced SNP error rate compared to [a short-read sequencing technology], especially in those low-mappability regions’

The CARD team sequenced 222 control samples from frontal cortex research specimens. While deeper investigations are ongoing, a preliminary analysis detected more than 80,000 SVs  . Most were insertions or deletions, and most represented rare events.

In addition to overcoming alignment ambiguity with long reads, nanopore sequencing offers unique features that CARD scientists are using to comprehensively characterise their samples. As nanopore sequencing does not require PCR, it is possible to directly detect epigenetic modifications in native DNA, enabling the characterisation of both genomic variants and methylation from the same dataset — without the need for any additional library preparation steps. Kimberley said that ‘with this data, we can start to differentiate and visualise differences in methylation’.  Making use of this, the CARD team has generated haplotype-specific and cell type-specific methylation profiling data from the brain clinical research samples and cell lines representing neurons and microglia. The team’s computational pipeline can automatically produce de novo assemblies, variant files, and methylation calls for all samples analysed.

Utilising the unique accessibility and scalability of nanopore sequencing, the CARD researchers were able to generate very large amounts of data for one brain sample as part of their protocol optimisation work. With 400-fold coverage of the genome from a frontal cortex research sample, the team opted to produce high coverage for other regions of the brain as well. Now, they have a high-definition dataset with 70-fold nanopore sequencing coverage of research samples of the parietal cortex and primary visual cortex, along with 800-fold coverage of the cerebellum. The resource could be useful for understanding somatic as well as germline variants and for querying methylation patterns across regions of the human brain. In an analysis of one region, Kimberley highlighted that ‘we’ve been able to successfully detect low-frequency variants’.

With an optimised pipeline in place, Kimberley described how CARD scientists are now looking to sequence hundreds of samples per month and are targeting biobanks with diverse clinical research samples so they can begin ‘sequencing more diverse populations’. The first group of about 200 clinical research samples came from individuals of European descent; the next 150 samples will come from people of African descent. The team will continue to seek out samples representing diverse ancestries as the project progresses.

In conclusion, Kimberley described how she and her colleagues have ‘developed an efficient and scalable wet lab and computational protocol for nanopore long-read sequencing that [serves as a] genuine alternative to short reads for large-scale genomic projects’.

1. WHO (15 March 2023). Dementia. Available at: https://www.who.int/news-room/fact-sheets/detail/dementia [Accessed: 23 June 2023].

2. Billingsley, K.J. et al. Processing human frontal cortex brain tissue for population-scale Oxford Nanopore long-read DNA sequencing SOP V.2. DOI: dx.doi.org/10.17504/protocols.io.kxygxzmmov8j/v2 (2022).

3. Kolmogorov, M. et al. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. bioRxiv. DOI: 10.1101/2023.01.12.523790 (2023).