The McArthur laboratory’s research program is rooted in bioinformatics, functional genomics, and computational biology. It spans complex informatics approaches to the functional genomics of microbial drug resistance, development of biological databases, next generation sequencing for genome assembly and molecular epidemiology, automated literature curation approaches, controlled vocabularies for biological knowledge integration, and functional genomics approaches in environmental toxicology. As part of our Cisco funded program, we additionally research the use and generation of ‘Big Data’ in the biomedical sciences, with the goal of integrating biomedical research and clinical healthcare.
- Biological database design and predictive analytics, particularly in the areas of antimicrobial drug resistance and molecular epidemiology
- Comprehensive Antibiotic Resistance Database (arpcard.mcmaster.ca)
- Development of Integrated Health Biosystems research at McMaster University, bridging data-intensive biomedical research and clinical healthcare
- Bioinformatics workflows and Cloud computing for gene expression, gene regulation, and population genomics
- Next generation DNA sequence analysis, genome assembly, and genome annotation
- Molecular phylogenetics and phylogenomics
- Ecotoxicogenomics of environmental contaminants (metals, organics, pharmaceuticals) using zebrafish, mouse, and other model systems.
This research stream focusses on microbial diversity of human pathogens and the molecular evolution, genome informatics, and functional genomics of drug resistance. This includes a four-year collaboration with Dr. Gerry Wright of the Michael G. DeGroote Institute for Infectious Disease Research (McMaster University) in the construction of the Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca), with a consortium of academic and government researchers in Canada and the United Kingdom. The field of antibiotic drug discovery and the monitoring of new antibiotic resistance elements have yet to fully exploit the power of the genome revolution, yet resistance to antibiotics has been identified as one of the top 3 challenges to human health by the World Health Organization. Our ability to control diseases caused by bacteria that once were highly susceptible to available drugs is being eroded at an alarming rate. The emergence of so-called ‘super-bugs’ that are resistant to many, and sometimes all, of our available antibiotics has resulted in a growing health care crisis.
What has been missing from the arsenal of researchers, clinicians, drug discoverers, and regulators trying to address this dire situation is a comprehensive accessible data platform that integrates genomic and molecular data across the entire sequence space of bacterial genomes and metagenomes with respect to antibiotics, resistance, and drug targets. Despite the fact that the first genomes sequenced of free living organisms were those of bacteria, there have been few specialized bioinformatic tools developed to mine the growing amount of genomic data associated with pathogens. In particular, there are few tools to study the genetics and genomics of antibiotic resistance and how it impacts bacterial populations, ecology, and the clinic. We have initiated development of such tools in the form of the CARD. The CARD integrates disparate molecular data and provides an unique organizing principle in the form of the Antibiotic Resistance Ontology (ARO), which ties drugs and their targets, resistance genes, mechanism of resistance,molecular sequences, regulatory sequences and genes, mutations conferring resistance, bioinformatic models, and the published literature into a complex web of knowledge. This unique platform provides an informatic tool that bridges antibiotic resistance concerns in health care, agriculture, and the environment. Of particular interest is an improved understanding of genotype-phenotype relationships for predictive functional genomics – one of the key goals of the CARD is development of computational models for prediction of antibiograms (i.e. range of drug sensitivity and resistance) in bacteria from raw genome sequencing data generated by clinical and research institutions around the globe.
Text Mining & Metadata
While collation of high-throughput experimental data into databases involves handling of high volumes of data, this raw data is well structured (often produced by automated platforms), lending itself to computer-assisted curation. More challenging for databases is the curation of knowledge from the published literature. This is often accomplished by a team of literature curators, led by a Head Curator with experience with biocuration and/or the discipline focus of the database. CARD seeks to acquire curated information on resistance mechanisms, genes, and their targets to create a rich resource for development of functional genomics models of antibiotic resistance. While curation of molecular sequence data (i.e. resistance genes) is manageable, complex metadata on prevalence is often buried in the scientific literature. Yet, from a predictive and clinical perspective it is critical that we know the distribution of resistance genes around the globe, among pathogens, and environmentally. Only through advanced text mining approaches will we be able to add prevalence metadata to the CARD, allowing not just prediction of resistance genes but also means of transmission and dangerous acquisition of resistance by new hosts.
The McArthur lab also has an 8+ year research program in high-throughput, experimental functional genomics that involves examination of gene expression by microarrays or RNA-Seq, gene regulation by ChIP-Seq, and cis-regulatory element prediction. With researchers at the Woods Hole Oceanographic Institution (Woods Hole, MA), McMaster University (Hamilton, Canada), and the University of Alabama (Tuscaloosa, AL), we perform research in environmental toxicology, with emphasis on microarray, ChIP-Seq, RNA-Seq, and genomic investigation of the molecular response of adult and developing zebrafish to metal (e.g. metal stress transcription factor MTF-1), organic (e.g. tBHQ, TCDD), and pharmaceutical pollutants. This research program in experimental functional genomics has evolved from large-scale microarrays to the latest approaches in next-generation sequencing, such as RNA-Seq and ChIP-Seq to examine genome-wide patterns of transcription and gene regulation. These projects have closely examined the effects of organic and metal pollutants upon development and adults, using both zebrafish and cell culture as models. For example, recent ChIP-Seq experiments have been aimed at determining the role of aryl-hydrocarbon receptor repressor (AHRR) in HeLa cells exposed to 2,3,7,8-tetrachlorodibenzodioxin (TCDD). While the role of aryl-hydrocarbon receptor (AHR) in regulating response to xenobiotic TCDD is well known, we do not yet have a clear understanding of the genes influenced by AHRR nor whether this is modulated by TCDD exposure. Genome-wide ChIP-Seq will determine the complete binding site repertoire of AHRR and its perturbation by TCDD.
Collaboratively, Dr. McArthur and Cisco are dedicated to building a University-wide “research cloud” computing environment and infrastructure. This relationship will see McMaster build on its renowned research successes and strengthen its links with national and international partners from academia, government and industry. Part of the overall goal is to build a program in Integrated Health Biosystems, the aim of which will be to bridge the existing gulf between data-intensive areas of biomedical research and healthcare by integrating diverse biological datasets with clinical and environmental data. Dr. McArthur’s original training was as a field ecologist and statistician, with a strong emphasis on computational biology. As his career ‘followed the data’ into biomedical research, particularly parasite genomics and bacterial drug resistance, he consistently saw opportunities for translation of genomics technologies in biomedicine to ecological and evolutionary questions, particularly in the areas of ecotoxicogenomics and population genomics. As such, there is a strong emphasis in the lab on the development of bioinformatics workflows for both biomedical and ecological research, with an emphasis upon #usegalaxy.