Raphenya et al. Scientific Data volume 9, Article number: 341 (2022).
Whole genome sequencing (WGS) is a key tool in identifying and characterising disease-associated bacteria across clinical, agricultural, and environmental contexts. One increasingly common use of genomic and metagenomic sequencing is in identifying the type and range of antimicrobial resistance (AMR) genes present in bacterial isolates in order to make predictions regarding their AMR phenotype. However, there are a large number of alternative bioinformatics software and pipelines available, which can lead to dissimilar results. It is, therefore, vital that researchers carefully evaluate their genomic and metagenomic AMR analysis methods using a common dataset. To this end, as part of the Microbial Bioinformatics Hackathon and Workshop 2021, a ‘gold standard’ reference genomic and simulated metagenomic dataset was generated containing raw sequence reads mapped against their corresponding reference genome from a range of 174 potentially pathogenic bacteria. These datasets and their accompanying metadata are freely available for use in benchmarking studies of bacteria and their antimicrobial resistance genes and will help improve tool development for the identification of AMR genes in complex samples.