Charles Chiu: Clinical sequencing of pathogens and human host responses in acutely infected patients


Charles began his talk by describing how his lab is a translational laboratory and aims to bring research into the clinic. However, he noted this was commonly known as “the valley of death” as research discoveries into the clinic do not often make it. With a focus on diagnostics, he said that a proposed diagnostic test test needs to be rapid and have a turnaround in a matter of hours rather than weeks to be useful. Furthermore, these need to be able to make diagnoses in critical patients and be able to rapidly detect infection. Charles pointed out that there are often three types of infection that very rarely get a diagnosis, and these are pneumonia meningitis and fever/sepsis. The main problem is that failure to get a timely diagnosis hugely increases mortality rates and financial burden.

Charles then spoke about how they have developed a metagenomic assay that can be used to diagnose neurological conditions using cerebral-spinal fluid as a source of sample. Moving on to the main focus of his talk, Charles proposed the idea that machine learning algorithms could be used as a diagnostic tool and then set out to explain how this could occur. The theory being that, if you fed enough genomic information about all different types of infection, low resolution answers could be rapidly produced in order to quickly inform clinicians at the bed side. Charles said that with the use of sequencing being used routinely in his lab on patent samples, 99 % of the data was being under used and thus was a perfect source of information for model training. Here RNA-seq data was used to train the algorithm. However, Charles pointed out that, generally in a good transcriptome experiment you capture between 60 – 80 % of the transcripts in an organism but he only had about 26 % coverage. He then said he was going to demonstrate why he thought this was enough to provide the low resolution, broad brush answers he required. Suddenly two blurry photographs appeared on the screen and Charles challenged the audience to determine what they were. As they slowly came into focus it became apparent that one of the pictures was of a stylish popular bench top sequencer able to produce ultra-long reads, and the other was a sequencer that…. Wasn’t.

Making his point, Charles said that sequencing data from all different kind of disease could be used as training sets, for example bacterial infections, viral infections, auto immune diseases etc. Using 80 % of the data as a training set and 20% as a cross validation set a dummy model was used to generate a baseline score of 58 % accuracy. After screening a large number of different model types, it transpired that a radial based function SVM model with feature selection gave 95 % accuracy. Charles said the feature selection part was important as it selected 1000 differential regions of genomic data which discriminated between the different classes of disease. He then said that while this produces great accuracy on a cross validation set, does it work in real cases? Impressively it was able to discriminate between bacterial and viral infections with 91 % accuracy. Next, looking at an unknown case of encephalitis, the model suggested that it was a viral infection with 83 % certainty and it transpired it was a case of Rubella. Next Charles showed how he and his team have created a docker container to run this model locally and used the example of a 2-year-old with a large brain abscess. All clinical cultures were negative, but a successful diagnosis of a bacterial infection was made within 3 hours using the model.

In his closing remarks Charles spoke about a large nanopore sepsis study where he compared a cohort with known bacteremic infection with healthy controls. Showing that good correlations between the ERCC spike in RNA controls he moved on to rapidly discuss some of the findings in the dying moments of his time on stage. Obtaining around 50 % transcriptome coverage per patient, he showed that those with bacterial infections had between 51 % and 73 % probability, according to his model, of having a bacterial infection compared with between 29 and 48 % in the patients in the “other” category. Skipping through his summary slide, one could just make out the statement “Nanopore sequencing coupled with laptop-compatible, rapid analysis  pipelines  can collect  and  analyze data  in  3-6 hours,  a  time frame  amenable  to clinical  diagnosis.” before he finished the plenary session and questions began.