Maguire, F., B. Alcock, F.S. Brinkman, A.G. McArthur, & R.G. Beiko. 2018. AMRtime: Rapid Accurate Identification of Antimicrobial Resistance Determinants from Metagenomic Data. Oral presentation at the Third American Society for Microbiology Meeting on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatics Pipelines, Washington, D.C.
Abstract: Metagenomics, the direct sequencing of the mixture of genomes present in a sample, is an increasingly common workflow within the life sciences. It is frequently used to investigate previously intractable problems such as the functional characterisation of entire microbial environments. One such use-case of global and national public-health importance is analysing the nature and transmission dynamics of antimicrobial resistance (AMR) determinants in human, agri-food and environmental samples. Recently some tools have been developed to profile AMR from metagenomes, however, these are generally limited to profiling at the level of AMR genes clustered by % sequence identity, which may or may not be biologically meaningful. By exploiting the expertly curated ontological structure of the Comprehensive Antibiotic Resistance Database (CARD) and new CARD Prevalence datasets, we have developed an approach using a hierarchical set of machine learning classifiers. This allows us to produce gene-specific AMR profiles to 2386 determinants as well as profiles for higher order, biologically informed, AMR gene family groups. Firstly, DIAMOND based heuristically accelerated homology searches are used to filter out non-AMR related metagenomic reads. This filtering has been optimised to prioritise minimisation of false negatives over minimising false positives. Features generated from these homology searches as well as sequence features are then used to train a random forest classifier to classify filtered reads into one of 227 CARD AMR gene families (e.g. MCR phosphoethanolamine transferase). For each gene family an additional random forest classifier is trained to classify reads into one of the specific AMR determinants belonging to that family (e.g. MCR-1, MCR-2, MCR-3 etc.). This process involves very little computational overhead when classifying beyond the initial homology search. On a fully held out test-set of MiSeq reads simulated from the CARD canonical gene sequences this method resulted in an average precision and recall of 0.993 and 0.987 at the AMR gene family level. Within the 227 AMR families, 70% (158) had an average F1-score greater than 0.99 for classification to specific AMR determinants. A further 10% (24) averaged F1-scores between 0.8 and 0.99. In comparative analyses on the same dataset this outperformed homology searches alone, read mapping and variation graph based methods in terms of average overall accuracy and precision. Further work will aim to improve classification within certain families and expand AMRtime to include variant based AMR models as well as meta-models (e.g. multi-component efflux pump systems).
Welcome #TeamVirulence, left to right: Rachel Tran (Biochem 3R06), Sally Min (BiomedDC 4A15), Anatoly Miroshnichencko (BiomedDC 4A15), and Rafik El Werfalli (BiomedDC 4A15), who are collectively working on development of CARD:Virulence, a new branch of the Comprehensive Antibiotic Resistance Database dedicated to the molecular surveillance of bacterial virulence factors.
- Alcock, B., A.R. Raphenya, A.N. Sharma, K.K. Tsang, T.T.Y. Lau, A. Hernandez-Koutoucheva, & A.G. McArthur. 2018. Data and curation in the Comprehensive Antibiotic Resistance Database. Poster presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Lau, T.T.Y., A.R. Raphenya, B. Alcock, & A.G. McArthur. 2018. Optimizing antimicrobial resistance surveillance tools through biological data organization and taxonomic identification of resistance genes. Poster presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Maguire, F., A.R. Raphenya, B. Alcock, A.G. McArthur, F.S. Brinkman, & R.G. Beiko. 2018. The cost of speed: evaluating systematic failures in metagenomic AMR profiling. Poster presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Raphenya, A.R., B. Alcock, K.K. Tsang, A.N. Sharma, T.T.Y. Lau, A. Hernandez-Koutoucheva, & A.G. McArthur. 2018. The Comprehensive Antibiotic Resistance Database and the Resistance Gene Identifier – Prediction of antimicrobial resistance genes and mutations for genomic and metagenomic sequencing data. Oral presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Tsang, K.K., H. Zubyk, S. Chou, G.D. Wright, & A.G. McArthur. Decoding bad bags: Predicting antibiotic resistance phenotypes from genotype. Oral presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
Kara Tsang passed her graduate transfer exam today, officially moving from the McMaster Biochemistry & Biomedical Sciences Masters program to the Ph.D. program. Kara’s work focusses on the intersection of biocuration, bioinformatics, machine learning, mutant screening, and phenotypic testing for prediction antimicrobial resistance phenotype from genotype. Well done Kara!
Update: Hot on the heels of becoming a Ph.D. student, Kara has won a 2018/2019 Department of Biochemistry & Biomedical Sciences’s Fred and Helen Knight Enrichment Award!
Tammy Lau has been awarded a prestigious Michael G. DeGroote Institute for Infectious Disease Research (IIDR) Summer Student Fellowship for her work on development of k-mer approaches to predicting pathogen-of-origin for metagenomics antimicrobial resistance gene sequences. More details here.
The Comprehensive Antibiotic Resistance Database has been updated, http://card.mcmaster.ca
CARD Curation: Addition of HERA, TRU, & ACI beta-lactamases, sul4, and new quinolone efflux pumps.
Antibiotic Resistance Ontology: Expanded to include an entirely new branch describing AMR phenotypic testing methods. ARO additionally now officially available at the OBO Foundry, allowing formal integration with other ontological resources, most notably the Genomic Epidemiology Application Ontology (GenEpiO), https://github.com/genepio/genepio.
Resistance Gene Identifier: Resistome prediction for low quality or low coverage assemblies, merged metagenomics reads, and small plasmids or assembly contigs. Includes prediction of partial AMR genes. Support added for Docker operating-system-level virtualization (i.e. containerization).
Prevalence, Resistomes, & Variants: Expanded to 67 important pathogens, with a focus on ESKAPEs, WHO Priority Pathogens, and agents of sepsis.
The McArthur lab and the Comprehensive Antibiotic Resistance Database are proud to join the Canadian Anti-Infective Innovation Network, International Genomic Epidemiology Application Ontology Consortium, and Integrated Rapid Infectious Disease Analysis Project!
The Comprehensive Antibiotic Resistance Database has been updated, http://card.mcmaster.ca
This February 2018 release is our largest to date and includes new data types, a new classification system, an entirely new version of the Resistance Gene Identifier, and website improvements.
CARD Curation: 37 new ADC beta-lactamases, 21 PDC beta-lactamases, new MCR proteins, 23 rRNA mutations, resistant isoleucyl-tRNA synthetases, hundreds of new resistance mutations, and more. While in past releases all curated AMR mutations were those characterized from clinical isolates, CARD now additionally includes mutations discovered via in vitro selection experiments. Ontological improvements have been made to enable an entirely new classification system for CARD data and RGI results: resistance determinants are now systematically categorized by AMR Gene Family, Drug Class, and Resistance Mechanism. The Antibiotic Resistance Ontology is now additionally available via GitHub, https://github.com/arpcard.
Resistance Gene Identifier: Entirely new codebase, compatible with CARD data (card.json) version 2.0.0 and up (download separately). Open Reading Frame (ORF) prediction using Prodigal, homolog detection using BLAST (default) or DIAMOND, and Strict significance based on CARD curated bitscore cut-offs. Addition of rRNA mutation and efflux over-expression models. Hits of 95% identity or better are automatically listed as Strict. All results organized by revised ARO classification: AMR Gene Family, Drug Class, and Resistance Mechanism. Revised documentation, command line menu, and website graphical interface. The Resistance Gene Identifier is now additionally available via GitHub, https://github.com/arpcard.
Prevalence, Genomes, & Variants: Expansion of our computer-generated data set on the prevalence of AMR genes and variants among the sequenced genomes, plasmids, and whole-genome shotgun assemblies available at NCBI for clinically important pathogens. CARD Prevalence 2.0.0 is based on sequence data acquired from NCBI on August 28, 2017, analyzed using RGI 4.0.0 (DIAMOND homolog detection) and CARD 2.0.0. Now includes results for protein overexpression models and rRNA mutations. All results organized by the revised ARO classification: AMR Gene Family, Drug Class, and Resistance Mechanism. Download files now include 35000+ genome annotations and all predicted sequence variants.