We were super excited to see the recent paper “Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life“. Really, the title says it all… the authors took hundreds of published metagenomic studies (focusing on non-human environments) and assembled thousands of genomes. Awesome.
But we were curious how many of these genomes came from the built environment since we’ve been working to improve the number of reference genomes from the built environment. Right away this runs into the problem of defining the built environment. There are a whole bunch of samples from mines, bioreacters, wastewater, oil sands, etc. We decided to limit ourselves to unambiguous calls which including drinking water, kitchen counters, and the New York subway system. None of the assembled genomes came from the drinking water or kitchen but there were a whole bunch from the subway system… 1,280 to be exact! These genomes are from dozens of different genera ranging from the common/expected (e.g. Bacillus) to things I’ve never heard of (e.g. Buttiauxella).
This effort dramatically increases the number of available references genomes from the built environment (depending on the target analysis, these are of course not complete genomes). For what it’s worth, I’ve taken the Supplemental Materials listing the genomes and kept only the 1,280 from the subway system and alphabetized the results. Attached here if anyone is interested. Thanks to Guillaume Jospin for doing the actual work.
Wow, this is great work, David and Guillaume. Thank you.
This should really help in understanding genomes of the built environment.