Kit 12 device and informatics

1. Running Kit 12 chemistry on your device

IMPORTANTE

The Kit 12 chemistry runs at 30°C on nanopore sequencing devices. This is several degrees cooler than other chemistries. While the protocol was initially developed on GridION and PromethION, we also support its use on MinION Mk1C, as the MinION Mk1C device's temperature control allows the flow cell to be maintained at 30°C for the duration of the run. However, we cannot guarantee the same level of temperature control on the MinION Mk1B. Therefore, if you are running Kit 12 chemistry on the MinION Mk1B, ensure that the ambient temperature does not exceed 23°C.

IMPORTANTE

If you are using a PromethION, your local Field Application Specialist will be able to provide information on the suitability of your device for running Kit 12 chemistry and best practice on locations and numbers of flow cells that can be run simultaneously.

The sequencing scripts for running Kit 12 chemistry are included in the latest release of the MinKNOW software.

Please make sure you are using the latest version of MinKNOW. Software downloads for each device type are available on the Community Software Downloads page. Set the sequencing parameters as described in the MinKNOW protocol under "Starting a sequencing run", and choose Kit 12 kits (e.g. SQK-LSK112) under Kit selection.

2. Basecalling Kit 12 simplex data in MinKNOW

Basecalling options

We offer two options for basecalling Kit 12 data with sequencing accuracies of 99% and above (Q20+):

  • Simplex basecalling, where the template DNA strand passes through the nanopore and is basecalled.
  • Duplex basecalling, where the complement strand is read immediately after the template strand and the consensus basecall for both strands leads to a further increase in accuracy.

Simplex duplex options

Each of these options is described in more detail in the "Basecalling Kit 12 data" subsections:

We recommend basecalling Kit 12 data in MinKNOW.

For optimal performance, we recommend using a GridION, PromethION, or a computer with:

  • 64 bit Linux (Windows will be supported in future)
  • Intel i7, i9, Xeon, or better processor
  • At least 16 GB of RAM
  • An NVIDIA GPU, at least RTX 2070 or better, with at least 16 GB of GPU memory
  • At least 1 TB SSD

Make sure you are using the most recent version of MinKNOW.

Set the sequencing parameters as described in the MinKNOW protocol under "Starting a sequencing run", and choose Kit 12 kits (e.g. SQK-LSK112) under Kit selection.

Kit 12 kit selection

Introduction to read splitting

With increased follow-on rates of the Kit 12 chemistry (the rate of the complement strand entering the pore directly after the template strand has passed through), we have observed a higher rate of concatemerisation compared to the Ligation Sequencing Kit (SQK-LSK110). We are classifying these reads as 'informatic chimeras' as they are not physically joined during the library preparation process.

With SQK-LSK110, we typically observe <2% concatemerisation and at this rate, it typically does not affect downstream applications. With, for example SQK-LSK112, we have observed a rate as high as 10%. Both MinKNOW (v21.11 and higher) and stand-alone Guppy (v5.1 and higher) now offer the option of splitting these reads.

As you can see from the following example, the majority of the informatic chimeras (yellow) are removed after splitting for a human (native) and E. coli (PCR) sample.

It is important to note that the read splitting function is not designed to split reads that are incorrectly ligated together during sample preparation. While these make up a small percentage of reads, users should take care with ligation steps to follow the protocol carefully to reduce the chance of creating them.

Read splitting

Enable read splitting in MinKNOW

When configuring your experiment, under Output - Filtering options, toggle "Enable read splitting" to ON.

readsplit

3. Basecalling Kit 12 simplex data in Guppy

Basecalling options

We offer two options for basecalling Kit 12 data with sequencing accuracies of 99% and above (Q20+):

  • Simplex basecalling, where the template DNA strand passes through the nanopore and is basecalled.
  • Duplex basecalling, where the complement strand is read immediately after the template strand and the consensus basecall for both strands leads to a further increase in accuracy.

Simplex duplex options

Each of these options is described in more detail in the "Basecalling Kit 12 data" subsections:

Alternatively, you can basecall in Guppy. For this, you will need to install Guppy version 5.1 or above.

To run Guppy basecalling on R10.4 data, specify one of the dna_r10.4_e8.1 config files, such as dna_r10.4_e8.1_fast.cfg, dna_r10.4_e8.1_hac.cfg, or dna_r10.4_e8.1_sup.cfg. For example:

guppy_basecaller -c dna_r10.4_e8.1_fast.cfg [other Guppy arguments]

For 9.4.1 data, specify one of the dna_r9.4.1_450bps config files, e.g.

guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg [other Guppy arguments]

Basecall the data using Guppy version 5.1 or higher.

You can find instructions for setting up and running the Guppy software in the Guppy protocol.

Introduction to read splitting

With increased follow-on rates of the Kit 12 chemistry (the rate of the complement strand entering the pore directly after the template strand has passed through), we have observed a higher rate of concatemerisation compared to the Ligation Sequencing Kit (SQK-LSK110). We are classifying these reads as 'informatic chimeras' as they are not physically joined during the library preparation process.

With SQK-LSK110, we typically observe <2% concatemerisation and at this rate, it typically does not affect downstream applications. With, for example SQK-LSK112, we have observed a rate as high as 10%. Both MinKNOW (v21.11 and higher) and stand-alone Guppy (v5.1 and higher) now offer the option of splitting these reads.

As you can see from the following example, the majority of the informatic chimeras (yellow) are removed after splitting for a human (native) and E. coli (PCR) sample.

It is important to note that the read splitting function is not designed to split reads that are incorrectly ligated together during sample preparation. While these make up a small percentage of reads, users should take care with ligation steps to follow the protocol carefully to reduce the chance of creating them.

Read splitting

Enable read splitting in Guppy

To enable read splitting in Guppy, use the --do_read_splitting parameter when setting up your basecalling run. You can also limit the number of times a read will be passed into the read splitter (--max_read_split_depth) and set a minimum read splitting score (--min_score_read_splitting). For more information, refer to the "Setting up a run: configurations and parameters" section of the Guppy protocol.

4. Basecalling Kit 12 duplex data

Basecalling options

We offer two options for basecalling Kit 12 data with sequencing accuracies of 99% and above (Q20+):

  • Simplex basecalling, where the template DNA strand passes through the nanopore and is basecalled.
  • Duplex basecalling, where the complement strand is read immediately after the template strand and the consensus basecall for both strands leads to a further increase in accuracy.

Simplex duplex options

Each of these options is described in more detail in the "Basecalling Kit 12 data" subsections:

We are now offering a duplex read basecaller in Guppy (duplex_tools), where the template and complement strands of a read can have their basecall data combined to provide a more accurate sequence.

Run simplex basecalling and read splitting in MinKNOW or Guppy.

More information on this can be found in the previous section, Basecalling Kit 12 simplex data.

Ensure you have installed Guppy version 6.0.0 or higher.

Run duplex basecalling in Guppy.

The "duplex basecalling" section of the Guppy protocol contains instructions for how to perform duplex basecalling. To perform duplex basecalling, the template and complement read pairs must first be identified. After read pairing and filtering, the duplex_tools executable can then be launched.

5. Assembly

Recommendations for assembly

Flye is recommended as an assembly tool for Kit 12 genome assembly (https://github.com/fenderglass/Flye).

We have observed that assembling haplotypes separately significantly improves genome contiguity, although each assembly only uses half the data.

There are three Flye parameters that we recommend are tuned for the best performance with Kit 12 sequence data:

  1. Configuring the command line parameter --min-overlap 10000 should deliver a modest improvement in assembly contiguity when using libraries optimised for read length.
  2. It is recommended that the --nano-corr parameter is set (to specify that the sequences are "corrected"). This provides a significant improvement to assembly NG50 compared to when the --nano-raw (uncorrected sequence) setting is used. We have observed NG50 increases from 58 Mb to 67 Mb for collapsed assemblies, when assembling both haplotypes at once.
  3. We typically adjust the "asm_corrected_reads.cfg file in the flye/config/bin_cfg/ folder to increase haplotype-specific assembly NG50s and to remove any major misjoins.

a. enable homopolymer compressed scoring (hpc_scoring_on = 1) b. increase the minimizer_window to 10 c. decrease the repeat_graph_ovlp_divergence to 0.005 increases haplotype-specific assembly NG50s to 84 Mb/84 Mb and removes all major misjoins

Last updated: 3/10/2023

Document options