ギャップレスゲノムアセンブリ時代の到来

Published on: April 28 2022

Plant

果物の生産は、現在も多くの国で大きな収入源となっています1,2。しかし、多くの植物と同じく、果物の大量栽培は気候変動とそれに伴う植物に発生する様々な病気と害虫により脅かされています。このようなストレス因子に対する抵抗力を持つ栽培品種の育種は、十分な生産量を確保するための重要なポイントであり、そのためにはゲノムの全容解明が必要不可欠です 1 。

1年当たり 670 万ヘクタールの土地で 400 万トンを超えるサクランボが生産されていることからも分かるように、サクランボは経済的に重要な果物です。多くの果物と同じように、サクランボも気候変動や病気に強く、そしておいしくなるような育種がされています。また、多年草であるため、育種には多大な労力と時間がかかります。サクランボの選択育種をやりやすく、収量を最大化させるには、ゲノム情報に基づいた育種ガイドが必要です。しかし、ショートリードシークエンスデータを用いてアセンブリした植物ゲノムは、主にその反復性の高さから、断片化されて多くのギャップを含むという傾向があり3 、Prunus fruticosa のゲノムもその例外ではないことが明らかにされています。ゲノムアセンブリが不完全な場合、重要な遺伝情報が欠けている可能性があるため、遺伝情報を曖昧にしか理解することができません。

'…高品質の完全なドラフトゲノムを作成するには（ナノポア）技術のみで十分です1'

Wöhner らは、このような困難を考慮に入れ、Oxford Nanopore 社の PromethIONTM で作成したロングリードを用いて、P. fruticosa のドラフトアセンブリを作成しました 1 。Wöhner は、ナノポアシークエンスが複雑なゲノムの高品質なde novo アセンブリを作成するうえで独立した技術となり得るものであり、「ポリッシングのためにショートリードデータに頼る」必要がなくなることを強調しています。同チームは、ナノポアのロングリードのみを用いて、スキャホールド N50 が約 44 Mb で BUSCO スコア（ゲノムの完全性を示す指標）が 98.7% の最終アセンブリ（連続性の高いアセンブリ）を取得しました。さらに、同チームはわずか 30 倍のカバレッジで 4 倍体（4n）ゲノムの親ハプロタイプの大部分を解析することにまで成功しています。このアセンブリは、今後の育種戦略を決定するうえで不可欠なリソースとなり、Prunus に関する今後の分子学的および進化学的研究の基礎となるはずです。

庶民的な果物のバナナは、世界的に最も消費されている果物の 1 つです 2 。その栽培は、食物の供給だけでなく、多くの経済圏の安定にも不可欠です 2 。作物の品質と収量を改善するには戦略的な育種プログラムが必要ですが、その実現の成否はバナナゲノムに関する包括的な知見にかかっています 2 。多くの作物のゲノムと同じく、バナナゲノムはリピート配列、構造変異、低複雑度領域が豊富にあることからアセンブリが困難です 1,2。ショートリードに基づくバナナのゲノムアセンブリは、反復配列におけるリードのマッピング精度が低いため、連続性の低いものとなります2 。

図1 2つのバナナゲノムアセンブリの染色体サイズの比較から、ナノポアのロングリードとウルトラロングリードの利点が示されています。最新のアセンブリで黄色の染色体の長さが増しているのは（Belser et al. 20212 ）、ナノポアのロングリードにより正確に検出したリピート配列が含まれているためであると考えられます。黄色の染色体は、セントロメア部位（赤色）をはるかによく表しており、以前のアセンブリ（白色の染色体）では欠けていた染色体末端のテロメアを検出することができています（Martin et al. 20164 ）。画像の出典： Belser et al. (2021)2 。

Belser らは、この点を踏まえて、ナノポアのロングリードを用いてバナナの栽培種Musa acuminata のゲノムをアセンブリしました。同チームは、1 枚の PromethION R9.4.1 Flow Cell で全ゲノムシークエンスを行って 177 倍のゲノムカバレッジを得ており、このうち 75 kb を超えるリードから17 倍のカバレッジを得ています。コンティグ N50 長は以前のショートリードアセンブリにおける平均 42 kb から 32 Mb に増加したほか、何より重要な点としてゲノムアセンブリのサイズが推定ゲノムサイズとほぼ同じになりました。以前のアセンブリでは達成できなかったことです。さらに、ショートリードデータに基づく過去の研究で報告された rDNA 遺伝子単位がわずか 130 であったのに対して、今回のアセンブリでは 7,696 の遺伝子単位が報告されています（図 1）。以上のデータから、ゲノムの複雑な領域の解析にはこの最新のアセンブリの方が優れており、最終的は 5 つの染色体をテロメアからテロメアまで網羅するに至ったことが示されています。

同チームは、別のロングリードシークエンス技術を用いた場合、「セントロメア反復配列と共に検出されるセントロメア領域が非常に断片化されており（中略）このような反復性の高い領域の検出にはウルトラロング［ナノポア］リードが重要であることを示す根拠となっている」ことを確認しており、バナナゲノムの高品質なアセンブリを作成するにはウルトラロングリードが重要な役割を果たし、バナナの進化史を読み解いて遺伝研究を推し進める味方となる点を強調しています 2 。

‘染色体シークエンスをテロメアからテロメアまでギャップを残さずアセンブリすることが可能になった2’

Fruit production remains a huge source of income for many countries^1,2. However, like many plants, their mass cultivation is threatened by climate change and the accompanying introduction of pests and diseases. Breeding of new cultivars that are resistant to such stress factors represents the kernel to securing sufficient production; to succeed in doing so is largely dependent on a comprehensive understanding of the genome¹.

With over 4 million tons of cherries produced across 6.7 million hectares per year, cherries are an economically important fruit. As with many fruits, cherries are bred to be resistant to climate change and disease, as well as to be delicious. From a practical viewpoint, breeding new cultivars is labour-intensive and time-consuming owing to their perennial nature. To make selective breeding a more viable practice in cherries and maximise yield, genomic information is required to guide breeding practices. However, plant genomes assembled using short-read sequencing data tend to be fragmented, containing many gaps, primarily due to their repetitive nature³, the genome of Prunus fruticosa, commonly known as the dwarf cherry, has proven to be no exception. Incomplete genome assemblies can lead to ambiguity in our understanding, as key genetic information may be missed.

'... (nanopore) technology alone can sufficiently produce a high-quality complete genome draft'

Wöhner, T.W. et al. Genomics (2021)

Considering these difficulties, Wöhner et al. generated a draft assembly of P. fruticosa using long reads generated on the Oxford Nanopore PromethION device¹. Wöhner highlighted that nanopore sequencing can be a standalone technology for generating high-quality de novo assemblies of complex genomes, with no need to 'rely anymore on short-read data for polishing'. Using long nanopore reads alone, the team obtained a final assembly with a scaffold N50 of around 44 Mb and a BUSCO score (a measure of genome completeness) of 98.7%, representing a highly contiguous assembly. To add the cherry on the cake, the team were able to largely resolve the parental haplotypes of the tetraploid (4n) genome with just 30x depth of coverage. This assembly will prove to be an invaluable resource for determining future breeding strategies, and a foundation for further molecular and evolutionary Prunus research.

The humble banana is one of the most widely consumed fruits globally². Their cultivation is not only essential for providing populations with sustenance, but also for securing many economies². Strategic breeding programs are necessary to improve crop quality and yield, though achieving this depends on a comprehensive knowledge of the banana genome². Like many crop genomes, the banana genome presents a challenge to assemble owing to the abundance of repeat elements, structural variants, and low complexity regions^1,2.

Figure 1: Chromosomal size comparison between two banana genome assemblies demonstrates the benefits of long and ultra-long nanopore reads. The increased lengths of the yellow chromosomes from the latest assembly (Belser et al. 2021²) can be attributed to the inclusion of repeat elements accurately resolved with the long nanopore reads. The yellow chromosomes have far greater representation of centromeric sites (red) and the ability to capture telomeres at the chromosomal ends, which were missed by the older assembly (white chromosomes; Martin et al. 2016⁴). Image taken from: Belser et al. (2021)².

In light of this, Belser et al. used long nanopore reads to assemble the genome of the domesticated banana species Musa acuminata. The team obtained 177x genomic depth of coverage from whole-genome sequencing on a single PromethION R9.4.1 Flow Cell; of this, 17x depth was obtained from reads >75 kb. Contig N50s increased in length from an average of 42 kb in previous short-read assemblies to 32 Mb, and crucially the size of the genome assembly matched closely to the estimated genome size, for which previous assemblies had fallen short. Furthermore, previous iterations based on short-read data reported just 130 rDNA gene units, compared to the 7,696 gene units reported in this assembly (Figure 1). These data indicate that complex regions of the genome were better resolved in this latest assembly, and ultimately culminated in the completion of five chromosomes, telomere-to-telomere.

Even using an alternative, long-read sequencing technology, the team found that 'centromeric regions, detected with centromeric repeats, are very fractionated ... underlying the importance of ultra-long [nanopore] reads to resolve these highly repetitive regions' and therefore highlighted the importance of ultra-long reads in the production of a high-quality assembly of the banana genome, which will help decipher this fruit's evolutionary history and support further genetic studies².

'Gapless and telomere-to-telomere assembly of chromosome sequences is now possible'

Belser, C. and Baurens, FC. et al. Commun Biol. (2021)

Find out more about plant genomics research

Wöhner, T.W. et al. The draft chromosome-level genome assembly of tetraploid ground cherry (Prunus fruticosa Pall.) from long reads. Genomics 113:4173–4183 (2021). DOI: https://doi.org/10.1016/j.ygeno.2021.11.002
Belser, C. and Baurens, FC. et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 4(1):1047 (2021). DOI: https://doi.org/10.1038/s42003-021-02559-3
Rousseau-Gueutin, M. et al. Long-read assembly of the Brassica napus reference genome Darmor-bzh. GigaScience 9(12) (2020). DOI: https://doi.org/10.1093/gigascience/giaa137
Martin, G. et al. Improvement of the banana Musa acuminata reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics 17:243 (2016). DOI: https://doi.org/10.1186/s12864-016-2579-4

消耗品

すべての製品

研究領域

技術

技術

Resources

Documentation

Nanopore Learning

会社

ニュース & イベント

グローバルパートナー

ギャップレスゲノムアセンブリ時代の到来

Download

入門

お問い合わせ

Oxford Nanoporeについて

技術

ギャップレスゲノムアセンブリ時代の 到来

Download

入門

お問い合わせ

Oxford Nanoporeについて

ギャップレスゲノムアセンブリ時代の到来