Nanopype: processing and quantification of short tandem repeats

Pay Giesselman, from the Max Planck institute, opened his talk by describing the software tools Nanopype and STRique,which have been designed to process sequence data and analyse targeted repeat expansions. Nanopype is a conglomeration of software tools that can be used and installed from a single container, and provides flexible tool combinations allowing the processing of raw fast5 files through to polishing or structural variant calling. Furthermore, the analysis pipelines are scalable onto cluster computers in order “to keep up with the increasing throughput of nanopore sequencing”. The pipeline is split into three steps, which include methods to optimise storage of data, and core processing, including basecalling and alignment methods. The final stage of the pipeline focuses on analysis and contains tools such as Nanopolish, Sniffles, Pinfish, and Pychopper.

As an example of how this tool set works, Pay showed how basecalling, aligning and methylation calling could be performed in a single step on the command line. Furthermore, the whole flexible toolkit could be initiated by defining the output path with specific key words. When looking at the output data that contained methylation data for a specific region, there was high concordance with traditional bisulphite sequencing results.

In the second half of his talk, Pay spoke about identifying and counting tandem repeat sequences using the toolset STRique. He started by displaying a raw nanopore squiggle showing a low complexity region that was clearly a repeat region. He stated that “London calling is about the only conference where you can display raw nanopore data and people know what you’re talking about!”. He stated that these repeat expansions are very difficult to sequence with other technologies, but this was particularly interesting as the expansion was found in a promoter region and could be quantified using nanopore. He showed how the tool kit he developed essentially cuts out the repeat region in “squiggle space” then uses a hidden Markov model to predict the number of tandem repeats represented by the squiggle. Showing this in practice, Pay demonstrated that by using Cas9 to target a specific region containing genes C9orf72 and FMR1, repeat expansions could be seen. In the case of the former gene, the wild type allele, with 8 repeats, could be identified, however two clusters of expanded repeats were also observed, one of 780 repeats and one of 400 repeats. As a second example, the FMR1 gene again showed two repeat expansions of 700 and 400 repeats in addition to the wild type.

Moving on to the last part of his talk, Pay said that he was going to say a little bit about methylation as “you essentially get that for free”. When looking at the consensus methylation profiles of these repeat expansion sequences, methylation in the promotor region correlated with repeat length.

Authors: Pay Giesselmann