In the context of an Alfred P. Sloan Foundation grant to the UNITE database to improve the support for fungi in the built environment, a workshop centered on public fungal ITS (barcode) sequences from the built environment was organized in Gothenburg, Sweden on May 23-24, 2016. Specifically, the ~40 physical and remote workshop participants sought to annotate these sequences according to the MIxS-BE standard to enable more precise query of the built mycobiome.
DNA sequences tend to be submitted to public sequence databases with very little metadata. Many sequences come with little extra information than the name of the sequence author and the tentative name of the study. Thus, anyone interested in retrieving all fungal sequences from, e.g., bathrooms or tiles will be hard put to assemble such a dataset.
There is, of course, every reason to think that analysis of all sequences recovered from bathrooms, or indeed any other substrate or locality, will prove to be a worthwhile scientific endeavor. The inability to pursue such research questions hampers progress both in the context of the built environment and of mycology itself.
The workshop participants went through and annotated all built-environment fungal ITS sequences from the international nucleotide sequence databases (INSDC) according to the MIxS-BE standard. The workshop participants noted a marked difference between the level of detail provided in the database entries — typically low — compared to the level provided in the underlying scientific publications — typically much higher. Although the final results have yet to be assembled, more than 10,000 new data points were recovered. The results will be implemented in the UNITE database and shared with other initiatives and online resources.
The workshop raised several pressing questions: is it scientifically defensible to release DNA sequences with virtually no metadata at all, when indeed those metadata are available to the sequence authors? When metadata are provided with sequence entries, shouldn’t we try to provide them in a homogenous, standardized way? And the fact that metadata are available in print-only journals or non-open access electronic journals, does that really make them “available”?
The workshop participants were, furthermore, surprised by the diversity of research efforts targeting the built environment. Expected substrates and localities such as dust, indoor air, and offices were common. Less expectedly, however, spacecraft, prehistoric buildings such as tumuli and man-made caves, and tombs and crypts (and even mummies) were also found to be fairly frequent targets for mycological research efforts. These exotic research efforts stretched the MIxS-BE standard to its limits.
The workshop participants also covered outdoor sequences that were found to belong to UNITE species hypotheses featuring sequences from the built environment. These outdoor sequences were annotated with country and host of origin in recognition of the fact that they, too, are in a position to inform us of the nature of indoor fungal communities.
We intend to publish the outcome of the workshop, and all data assembled during the workshop, in a scientific outlet later in 2016. In the meanwhile, the data will gradually be released in UNITE and in resources relying on UNITE for molecular identification of fungi.
One thought on “Built mycobiome sequence metadata annotation workshop, Gothenburg, May 23-24, 2016”