Maguire, F., B. Alcock, F.S. Brinkman, A.G. McArthur, & R.G. Beiko. 2018. AMRtime: Rapid Accurate Identification of Antimicrobial Resistance Determinants from Metagenomic Data. Oral presentation at the Third American Society for Microbiology Meeting on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatics Pipelines, Washington, D.C.
Abstract: Metagenomics, the direct sequencing of the mixture of genomes present in a sample, is an increasingly common workflow within the life sciences. It is frequently used to investigate previously intractable problems such as the functional characterisation of entire microbial environments. One such use-case of global and national public-health importance is analysing the nature and transmission dynamics of antimicrobial resistance (AMR) determinants in human, agri-food and environmental samples. Recently some tools have been developed to profile AMR from metagenomes, however, these are generally limited to profiling at the level of AMR genes clustered by % sequence identity, which may or may not be biologically meaningful. By exploiting the expertly curated ontological structure of the Comprehensive Antibiotic Resistance Database (CARD) and new CARD Prevalence datasets, we have developed an approach using a hierarchical set of machine learning classifiers. This allows us to produce gene-specific AMR profiles to 2386 determinants as well as profiles for higher order, biologically informed, AMR gene family groups. Firstly, DIAMOND based heuristically accelerated homology searches are used to filter out non-AMR related metagenomic reads. This filtering has been optimised to prioritise minimisation of false negatives over minimising false positives. Features generated from these homology searches as well as sequence features are then used to train a random forest classifier to classify filtered reads into one of 227 CARD AMR gene families (e.g. MCR phosphoethanolamine transferase). For each gene family an additional random forest classifier is trained to classify reads into one of the specific AMR determinants belonging to that family (e.g. MCR-1, MCR-2, MCR-3 etc.). This process involves very little computational overhead when classifying beyond the initial homology search. On a fully held out test-set of MiSeq reads simulated from the CARD canonical gene sequences this method resulted in an average precision and recall of 0.993 and 0.987 at the AMR gene family level. Within the 227 AMR families, 70% (158) had an average F1-score greater than 0.99 for classification to specific AMR determinants. A further 10% (24) averaged F1-scores between 0.8 and 0.99. In comparative analyses on the same dataset this outperformed homology searches alone, read mapping and variation graph based methods in terms of average overall accuracy and precision. Further work will aim to improve classification within certain families and expand AMRtime to include variant based AMR models as well as meta-models (e.g. multi-component efflux pump systems).
- Alcock, B., A.R. Raphenya, A.N. Sharma, K.K. Tsang, T.T.Y. Lau, A. Hernandez-Koutoucheva, & A.G. McArthur. 2018. Data and curation in the Comprehensive Antibiotic Resistance Database. Poster presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Lau, T.T.Y., A.R. Raphenya, B. Alcock, & A.G. McArthur. 2018. Optimizing antimicrobial resistance surveillance tools through biological data organization and taxonomic identification of resistance genes. Poster presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Maguire, F., A.R. Raphenya, B. Alcock, A.G. McArthur, F.S. Brinkman, & R.G. Beiko. 2018. The cost of speed: evaluating systematic failures in metagenomic AMR profiling. Poster presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Raphenya, A.R., B. Alcock, K.K. Tsang, A.N. Sharma, T.T.Y. Lau, A. Hernandez-Koutoucheva, & A.G. McArthur. 2018. The Comprehensive Antibiotic Resistance Database and the Resistance Gene Identifier – Prediction of antimicrobial resistance genes and mutations for genomic and metagenomic sequencing data. Oral presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
- Tsang, K.K., H. Zubyk, S. Chou, G.D. Wright, & A.G. McArthur. Decoding bad bags: Predicting antibiotic resistance phenotypes from genotype. Oral presentation at the Canadian Society of Microbiologists Annual Meeting, Winnipeg, Manitoba.
Kara Tsang passed her graduate transfer exam today, officially moving from the McMaster Biochemistry & Biomedical Sciences Masters program to the Ph.D. program. Kara’s work focusses on the intersection of biocuration, bioinformatics, machine learning, mutant screening, and phenotypic testing for prediction antimicrobial resistance phenotype from genotype. Well done Kara!
Update: Hot on the heels of becoming a Ph.D. student, Kara has won a 2018/2019 Department of Biochemistry & Biomedical Sciences’s Fred and Helen Knight Enrichment Award!
Tammy Lau has been awarded a prestigious Michael G. DeGroote Institute for Infectious Disease Research (IIDR) Summer Student Fellowship for her work on development of k-mer approaches to predicting pathogen-of-origin for metagenomics antimicrobial resistance gene sequences. More details here.
The Comprehensive Antibiotic Resistance Database has been updated, http://card.mcmaster.ca
CARD Curation: Addition of HERA, TRU, & ACI beta-lactamases, sul4, and new quinolone efflux pumps.
Antibiotic Resistance Ontology: Expanded to include an entirely new branch describing AMR phenotypic testing methods. ARO additionally now officially available at the OBO Foundry, allowing formal integration with other ontological resources, most notably the Genomic Epidemiology Application Ontology (GenEpiO), https://github.com/genepio/genepio.
Resistance Gene Identifier: Resistome prediction for low quality or low coverage assemblies, merged metagenomics reads, and small plasmids or assembly contigs. Includes prediction of partial AMR genes. Support added for Docker operating-system-level virtualization (i.e. containerization).
Prevalence, Resistomes, & Variants: Expanded to 67 important pathogens, with a focus on ESKAPEs, WHO Priority Pathogens, and agents of sepsis.
The McArthur lab and the Comprehensive Antibiotic Resistance Database are proud to join the Canadian Anti-Infective Innovation Network, International Genomic Epidemiology Application Ontology Consortium, and Integrated Rapid Infectious Disease Analysis Project!
The Comprehensive Antibiotic Resistance Database has been updated, http://card.mcmaster.ca
This February 2018 release is our largest to date and includes new data types, a new classification system, an entirely new version of the Resistance Gene Identifier, and website improvements.
CARD Curation: 37 new ADC beta-lactamases, 21 PDC beta-lactamases, new MCR proteins, 23 rRNA mutations, resistant isoleucyl-tRNA synthetases, hundreds of new resistance mutations, and more. While in past releases all curated AMR mutations were those characterized from clinical isolates, CARD now additionally includes mutations discovered via in vitro selection experiments. Ontological improvements have been made to enable an entirely new classification system for CARD data and RGI results: resistance determinants are now systematically categorized by AMR Gene Family, Drug Class, and Resistance Mechanism. The Antibiotic Resistance Ontology is now additionally available via GitHub, https://github.com/arpcard.
Resistance Gene Identifier: Entirely new codebase, compatible with CARD data (card.json) version 2.0.0 and up (download separately). Open Reading Frame (ORF) prediction using Prodigal, homolog detection using BLAST (default) or DIAMOND, and Strict significance based on CARD curated bitscore cut-offs. Addition of rRNA mutation and efflux over-expression models. Hits of 95% identity or better are automatically listed as Strict. All results organized by revised ARO classification: AMR Gene Family, Drug Class, and Resistance Mechanism. Revised documentation, command line menu, and website graphical interface. The Resistance Gene Identifier is now additionally available via GitHub, https://github.com/arpcard.
Prevalence, Genomes, & Variants: Expansion of our computer-generated data set on the prevalence of AMR genes and variants among the sequenced genomes, plasmids, and whole-genome shotgun assemblies available at NCBI for clinically important pathogens. CARD Prevalence 2.0.0 is based on sequence data acquired from NCBI on August 28, 2017, analyzed using RGI 4.0.0 (DIAMOND homolog detection) and CARD 2.0.0. Now includes results for protein overexpression models and rRNA mutations. All results organized by the revised ARO classification: AMR Gene Family, Drug Class, and Resistance Mechanism. Download files now include 35000+ genome annotations and all predicted sequence variants.
4th year Bachelor of Health Sciences student Alexandra Florescu has joined us for her Biochem 3A03 (Biochemical Research Practice) course. Alexandra will be collaborating with colleagues in the Genomic Epidemiology Ontology Consortium (genepio.org) on developing ontological terminology for phenotypic tests of antimicrobial resistance and microbial virulence via our ongoing Genome Canada Bioinformatics & Computational Biology funding.
Tsang, K.K. & A.G. McArthur. 2017. Encoding the efflux pump phenomena. Oral presentation at the Second American Society for Microbiology Meeting on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatics Pipelines, Washington, D.C.
Background: Efflux pumps are a major mechanism for intrinsic and acquired resistance to our current antibiotic armamentarium. Efflux mechanisms interplay synergistically with other resistance mechanisms, including drug permeability, degradation and inactivation, to strengthen pathogen antimicrobial resistance levels. Despite their clinical relevance, there is no resource that seeks to understand and predict the contribution of efflux pumps in antimicrobial resistance from genome sequence. This has resulted in limited prediction of the full potential of all resistance determinants in a bacterial cell.
Methods: The Comprehensive Antibiotic Resistance Database (CARD, https://card.mcmaster.ca/) and Resistance Gene Identifier (RGI) were optimized for E. coli and P. aeruginosa efflux pump detection through extensive curation and algorithmic development. Literature was mined and analyzed to curate all published information on E. coli and P. aeruginosa efflux pumps into CARD. Algorithmic development of RGI involved creating bioinformatics detection models and refining their parameters. The Efflux Pump Identifier (EPI) was developed to predict efflux pumps and antimicrobial resistance based on RGI results generated using CARD and tested using genome sequences of characterized, clinical multi-drug resistant E. coli and P. aeruginosa isolates.
Results: The Efflux Pump Identifier (EPI) analyzed 124 E. coli and 94 P. aeruginosa clinical multi-drug resistant samples to predict efflux pumps and their complex regulatory networks under three paradigms: 1) Perfect, 2) Partial, and 3) Putative. The Perfect paradigm identifies perfect matches to known efflux pumps curated into CARD. The Partial algorithm detects efflux pumps where at least one or more components of the efflux pump is not a perfect match to an efflux pump component in CARD, but likely a functional homolog. Lastly, the Putative algorithm discovers potential efflux pumps where all components are not perfect matches to previously curated components in CARD.
Conclusions: The development of the Efflux Pump Identifier (EPI) devotes effort to an area in antimicrobial resistance where insufficient attention has been paid in the past. This is a step towards answering the long-standing question in the efflux pump phenomena; is the detected efflux pump genotype being expressed to present a specific phenotype? Using the Efflux Pump Identifier (EPI) in tandem with the existing repertoire of detection tools for dedicated and mutational resistance determinants leads to the complete prediction of antibiogram from genome sequence.