InfoSheet
Kit 12 device and informatics V K12_S1018_v1_revB_01Dec2021
FOR RESEARCH USE ONLY
Contents
Running Kit 12 chemistry on your device
Basecalling Kit 12 data
- 2. Basecalling Kit 12 simplex data in MinKNOW
- 3. Basecalling Kit 12 simplex data in Guppy
- 4. Basecalling Kit 12 duplex data
Assembly
1. Running Kit 12 chemistry on your device
IMPORTANTE
The Kit 12 chemistry runs at 30°C on nanopore sequencing devices. This is several degrees cooler than other chemistries. While the protocol was initially developed on GridION and PromethION, we also support its use on MinION Mk1C, as the MinION Mk1C device's temperature control allows the flow cell to be maintained at 30°C for the duration of the run. However, we cannot guarantee the same level of temperature control on the MinION Mk1B. Therefore, if you are running Kit 12 chemistry on the MinION Mk1B, ensure that the ambient temperature does not exceed 23°C.
IMPORTANTE
If you are using a PromethION, your local Field Application Specialist will be able to provide information on the suitability of your device for running Kit 12 chemistry and best practice on locations and numbers of flow cells that can be run simultaneously.
The sequencing scripts for running Kit 12 chemistry are included in the latest release of the MinKNOW software.
Please make sure you are using the latest version of MinKNOW. Software downloads for each device type are available on the Community Software Downloads page. Set the sequencing parameters as described in the MinKNOW protocol under "Starting a sequencing run", and choose Kit 12 kits (e.g. SQK-LSK112) under Kit selection.
2. Basecalling Kit 12 simplex data in MinKNOW
Basecalling options
We offer two options for basecalling Kit 12 data with sequencing accuracies of 99% and above (Q20+):
- Simplex basecalling, where the template DNA strand passes through the nanopore and is basecalled.
- Duplex basecalling, where the complement strand is read immediately after the template strand and the consensus basecall for both strands leads to a further increase in accuracy.
Each of these options is described in more detail in the "Basecalling Kit 12 data" subsections:
We recommend basecalling Kit 12 data in MinKNOW.
For optimal performance, we recommend using a GridION, PromethION, or a computer with:
- 64 bit Linux (Windows will be supported in future)
- Intel i7, i9, Xeon, or better processor
- At least 16 GB of RAM
- An NVIDIA GPU, at least RTX 2070 or better, with at least 16 GB of GPU memory
- At least 1 TB SSD
Make sure you are using the most recent version of MinKNOW.
Set the sequencing parameters as described in the MinKNOW protocol under "Starting a sequencing run", and choose Kit 12 kits (e.g. SQK-LSK112) under Kit selection.
Introduction to read splitting
With increased follow-on rates of the Kit 12 chemistry (the rate of the complement strand entering the pore directly after the template strand has passed through), we have observed a higher rate of concatemerisation compared to the Ligation Sequencing Kit (SQK-LSK110). We are classifying these reads as 'informatic chimeras' as they are not physically joined during the library preparation process.
With SQK-LSK110, we typically observe <2% concatemerisation and at this rate, it typically does not affect downstream applications. With, for example SQK-LSK112, we have observed a rate as high as 10%. Both MinKNOW (v21.11 and higher) and stand-alone Guppy (v5.1 and higher) now offer the option of splitting these reads.
As you can see from the following example, the majority of the informatic chimeras (yellow) are removed after splitting for a human (native) and E. coli (PCR) sample.
It is important to note that the read splitting function is not designed to split reads that are incorrectly ligated together during sample preparation. While these make up a small percentage of reads, users should take care with ligation steps to follow the protocol carefully to reduce the chance of creating them.
Enable read splitting in MinKNOW
When configuring your experiment, under Output - Filtering options, toggle "Enable read splitting" to ON.
3. Basecalling Kit 12 simplex data in Guppy
Basecalling options
We offer two options for basecalling Kit 12 data with sequencing accuracies of 99% and above (Q20+):
- Simplex basecalling, where the template DNA strand passes through the nanopore and is basecalled.
- Duplex basecalling, where the complement strand is read immediately after the template strand and the consensus basecall for both strands leads to a further increase in accuracy.
Each of these options is described in more detail in the "Basecalling Kit 12 data" subsections:
Alternatively, you can basecall in Guppy. For this, you will need to install Guppy version 5.1 or above.
To run Guppy basecalling on R10.4 data, specify one of the dna_r10.4_e8.1
config files, such as dna_r10.4_e8.1_fast.cfg
, dna_r10.4_e8.1_hac.cfg
, or dna_r10.4_e8.1_sup.cfg
. For example:
guppy_basecaller -c dna_r10.4_e8.1_fast.cfg [other Guppy arguments]
For 9.4.1 data, specify one of the dna_r9.4.1_450bps
config files, e.g.
guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg [other Guppy arguments]
Basecall the data using Guppy version 5.1 or higher.
You can find instructions for setting up and running the Guppy software in the Guppy protocol.
Introduction to read splitting
With increased follow-on rates of the Kit 12 chemistry (the rate of the complement strand entering the pore directly after the template strand has passed through), we have observed a higher rate of concatemerisation compared to the Ligation Sequencing Kit (SQK-LSK110). We are classifying these reads as 'informatic chimeras' as they are not physically joined during the library preparation process.
With SQK-LSK110, we typically observe <2% concatemerisation and at this rate, it typically does not affect downstream applications. With, for example SQK-LSK112, we have observed a rate as high as 10%. Both MinKNOW (v21.11 and higher) and stand-alone Guppy (v5.1 and higher) now offer the option of splitting these reads.
As you can see from the following example, the majority of the informatic chimeras (yellow) are removed after splitting for a human (native) and E. coli (PCR) sample.
It is important to note that the read splitting function is not designed to split reads that are incorrectly ligated together during sample preparation. While these make up a small percentage of reads, users should take care with ligation steps to follow the protocol carefully to reduce the chance of creating them.
Enable read splitting in Guppy
To enable read splitting in Guppy, use the --do_read_splitting
parameter when setting up your basecalling run. You can also limit the number of times a read will be passed into the read splitter (--max_read_split_depth
) and set a minimum read splitting score (--min_score_read_splitting
). For more information, refer to the "Setting up a run: configurations and parameters" section of the Guppy protocol.
4. Basecalling Kit 12 duplex data
Basecalling options
We offer two options for basecalling Kit 12 data with sequencing accuracies of 99% and above (Q20+):
- Simplex basecalling, where the template DNA strand passes through the nanopore and is basecalled.
- Duplex basecalling, where the complement strand is read immediately after the template strand and the consensus basecall for both strands leads to a further increase in accuracy.
Each of these options is described in more detail in the "Basecalling Kit 12 data" subsections:
We are now offering a duplex read basecaller in Guppy (duplex_tools), where the template and complement strands of a read can have their basecall data combined to provide a more accurate sequence.
Run simplex basecalling and read splitting in MinKNOW or Guppy.
More information on this can be found in the previous section, Basecalling Kit 12 simplex data.
Ensure you have installed Guppy version 6.0.0 or higher.
Run duplex basecalling in Guppy.
The "duplex basecalling" section of the Guppy protocol contains instructions for how to perform duplex basecalling. To perform duplex basecalling, the template and complement read pairs must first be identified. After read pairing and filtering, the duplex_tools
executable can then be launched.
5. Assembly
Recommendations for assembly
Flye is recommended as an assembly tool for Kit 12 genome assembly (https://github.com/fenderglass/Flye).
We have observed that assembling haplotypes separately significantly improves genome contiguity, although each assembly only uses half the data.
There are three Flye parameters that we recommend are tuned for the best performance with Kit 12 sequence data:
- Configuring the command line parameter
--min-overlap 10000
should deliver a modest improvement in assembly contiguity when using libraries optimised for read length. - It is recommended that the
--nano-corr
parameter is set (to specify that the sequences are "corrected"). This provides a significant improvement to assembly NG50 compared to when the--nano-raw
(uncorrected sequence) setting is used. We have observed NG50 increases from 58 Mb to 67 Mb for collapsed assemblies, when assembling both haplotypes at once. - We typically adjust the "
asm_corrected_reads.cfg
file in theflye/config/bin_cfg/
folder to increase haplotype-specific assembly NG50s and to remove any major misjoins.
a. enable homopolymer compressed scoring (hpc_scoring_on = 1
)
b. increase the minimizer_window
to 10
c. decrease the repeat_graph_ovlp_divergence
to 0.005 increases haplotype-specific assembly NG50s to 84 Mb/84 Mb and removes all major misjoins