Martin opened his talk by stating that some types of mutations occur at a high rate are associated with disease, but have often been neglected; these are probably the most interesting mutations. They are neglected because often these mutations are hard to identify; they are characterised by tandem-repeat expansions/contractions, homologous recombination and transposable element insertions.
Martin outlined his laboratory's workflow for identifying these complex disease-causing mutations: the Oxford Nanopore PromethION platform is used for long-read sequencing of patient DNA; sequences are aligned and compared to the reference human genome; reads that have structural differences to the reference are grouped; differences which are present in humans without disease are then de-prioritised. According to Martin, the final step is essential because thousands of differences are present, and we want to remove the benign ones.
Martin further detailed the alignment stage of his workflow, which involves finding the most probable alignments between sequenced reads and the reference genome. Firstly the probabilities (rates) of substitutions, deletions, and insertions is calculated. Determining probability rates is not easy, and Martin described the development of a software called last-train to aid this process. Secondly, sequences are aligned based on these probabilities, from which alignments with higher probability are preferred.
This workflow allows for arbitrary rearrangements. One of his key observations has been that the "spontaneous generation" of new sequence is rare; most sequences are in fact from an ancestral sequence, derived by translocation or duplication. We make one "big assumption" that the reference genome is ancestral - meaning that there is a very convenient relationship between all DNA reads and the reference genome. Every part of a read must come from a unique part of the ancestor. If we didn't assume this relationship, the problem of dealing with arbitrary rearrangements becomes intractable.
The case of patient X was then introduced. This individual presented with split-hand/foot malformation, hearing loss, delayed development, and self-injuries. They were known to have a reciprocal translocation between chromosomes 7 and 15, but Martin and his team wanted to understand the rearrangement in more detail. By performing the long-read sequencing workflow, 2,813 groups of reads with structural differences were identified, of which 8 groups involved chr7/15 rearrangements. These 8 groups of reads were therefore explored further, and rearrangements found were full reconstructed - a reciprocal translocation with an additional complex rearrangement was identified. There was also loss of some ancestral sequence which was confirmed by microarray. Interestingly, Martin pointed out how the deletion could only be seen when the rearrangement was fully reconstructed - an interesting "holistic property of the rearrangement".
Martin lastly described how he and his collaborators have resolved the GGC tandem repeat expansion in the NOTCH2NLC gene. NOTCH2NLC isassociated with neuronal intranuclear inclusion disease, which is a disease that sometimes runs in families but can also sometimes be sporadic. This was achieved by long-read nanopore sequencing on the PromethION and his analysis pipeline, with DNA samples from 13 affected and 4 unaffected members of 8 families, plus 25 controls. This tandem repeat was confirmed in all 39 additional "sporadic" patients, and was completely absent from 200 controls. The fact that it was present in all patients but no controls suggested that this repeat was the cause of the disease.
Martin said that challenges remain if centromeric repeats are involved, as these tend to not be present in the reference genome. He concluded by saying that what he "would love to have [is] a reference genome that is ancestral."