Species identification from metagenomic sequence data has attracted considerable attention recently, and a number of software solutions are now available, based on whoe-genome assemblies, mapping to references, or informative kmers. In this talk I focus on a small corner of this area, that of species identification from medical samples, where we have an idea from the patient's condition (and perhaps from inspection of culture) what genus of bacterium we expect to find, but we do not know if there is a mixture. I specifically look at the case of Staphylococcus aureus, a commensal organism which lives in the noses of ~30% of us, and yet which can be a pathogen. Many assemblies of S. aureus strains are present, but generally only one assembly per species for other Staphylococcus species (some of which can cause illness). I'll talk about how we can address the issues which arise when we want confident answers in the face of such a biased set of prior information.
view more