Note – Jonathan Eisen invited Jack Gilbert to write a post about the Earth Microbiome Project especially in light of the recent paper on the topic by Gilbert et al. (see Eisen’s blog post about this paper here).
Post by Jack Gilbert submitted by email to Jonathan Eisen.
The Earth Microbiome Project started as a thought experiment. A ‘what if’ discussion to determine the potential scale of the problem associated with characterizing the breadth and depth of our observable microbial world. A lot of ideas were discussed during that first meeting, but we were firmly in the middle of the heyday of the ‘sequencing will solve it’ paradigm, and as all we had were extrapolative estimates about the diversity of microbial life, we decided that further data was an imperative. So we started the EMP, by essentially crowd sourcing samples from colleagues around the world, and then processing these samples in a standard way. What started out as a data gathering exercise to see if we could perform the statistical power calculation to determine the number of samples required to capture the breadth of diversity, has turned into something bigger and potentially more important. Using the resources at the EMP, the microbial ecology community and beyond have gained through improved access to high quality sequence data to test their hypotheses, as well as access to appropriate data analysis software and infrastructure to enable interpretation of the results. The EMP is not an edifice, it is an organic collaboration between hundreds of people around the globe, and that family will continue to grow.
We have continually had to overcome hurdles and build novel solutions to cope with handling the acquisition of tens of thousands of different sample types; to process the resulting sequence data and make this public and accessible to original collaborators in an appropriate way; and to ensure that the standardized methods are up to date, and flexible enough to adapt to new information that is constantly being acquired about the data, biases in PCR primers, and technical issues with sequencing platforms. We are currently at the stage of trying to coalesce the first 20-30,000 samples into a single OTU matrix that can be used to start exploring how microbial communities assemble themselves across the globes myriad gradients of physicochemical parameters. The data are vast, and hence we are able to explore many nuanced differences within microbial ecosystems and between different biomes, but they are by no means perfect. Gaps in metadata, a lack of continuity in data fields between projects, and an imbalance between some environments compared to others, make these first analyses difficult to perform. While each project is in-of-itself an appropriate hypothesis driven study, we are still working to fill the gaps in our database that will enable the community of researchers that make up the EMP to test the broader hypotheses, which are manifold.
While we continue to explore new ways of processing and analyzing the data that has been generated, we are also continuing to acquire samples from new and exciting studies around the world. In fact, many research groups are working with us to plan studies that will specifically fill our research gaps, helping us to address primary scientific questions. We are continually keen to hear from any and all researchers, from all walks of life, and at any phase in their career, whether by email or in person, so that we can work as a community to refine and improve this project and the database of free, open-access information.
UPDATE by Jonathan Eisen – I have added a picture of Jack Gilbert from that Snowbird conference – climbing to the top of the mountain and surveying the Earth.