Fungal amplicons

A recent post highlighted issues with analyzing fungal ITS data, and that inspired my labmate Sydney Glassman and me to want to share our experiences with using amplicons to characterize fungal communities. We are very excited that people are interested in delving into the wonderful world of fungi, and we wish to share our love of mycology with others! Fungi can have really interesting ecologies, and the field recently has had a lot of success with developing tools for studying environmental fungi.

  • First, the general. While ITS is the universal barcode for fungi, there are, of course, issues with it and the primers used to isolate it. There are multiple, potentially different, copies of ITS per species, and the typical primers show taxonomic bias in what they target – for example, Cantharellus may be lost. Plus, there are two variable sections within the ITS region. Given the sequence read length of current technologies, only one is typically targeted, but there is debate over whether it is best to amplify ITS1 or ITS2 . Our group amplifies ITS1, and we have had good luck with it. The primers we use as well as the specifics of the bioinformatics pipeline that we employ can be found in a recent paper by Smith and Peay.
  • ITS is variable in length, which can make it trickier to merge reads than, for example, 16S. We remove priming/adapter sites and low-quality sequences from the ends of reads, and we have found that this greatly improves the number of reads that can be paired.
  • The variable nature of ITS precludes any sort of alignment across broad groups of fungi, and thus fungal analyses are taxonomic rather than phylogenetic (i.e. no UniFrac). There are efforts afoot to change that, but currently, if you are interested in doing phylogenetics, you would need to target another region than ITS, most often the ribosomal small subunit (18S).  Meanwhile, for analyzing ITS, many of the default settings in QIIME, for instance, use a phylogenetically-informed process, so its important to use flags to mask alignments/trees.
  •  The ITS database for fungi typically used is the UNITE database, which began as a database of ectomycorrhizal fungi. While it has expanded greatly over the years, there are still many lineages of fungi yet to be represented or named (beyond “uncultured environmental clone”) in the database. [AMENDED] While the UNITE database is the best resource out there for characterizing fungi, and it continues to improve, some form of de novo (or open) OTU picking strategy probably makes sense for most environmental fungal studies.
  • [EDITED] After taxonomic assignment, many OTUs may be unassigned (for example, they appear as “No Blast Hit”). In our experience when we BLAST these OTUs by hand on GenBank, they are bacteriophage or chimeras spurious OTUs. So, we tend to remove OTUs that are unassigned after taxonomic identification.
  • Some fungi have two taxonomic names: one for when they described in their sexual stage and another in the asexual stage, before molecular approaches revealed that this was one species, not two. This legacy remains in the database, so the richness of fungi can seem inflated. For example, the sexual stages of some Aspergillus species were named as Eurotium, and both of these genera appear in the database.

We hope that sharing our experiences facilitates the inclusion of fungi across broad ecological settings.

 

12 thoughts on “Fungal amplicons

  1. Interesting post, thanks! A couple of quick follow up comments:

    As you note, it’s not currently possible to get useful phylogenetic trees for ITS. We recommend not building them, and using non-phylogenetic diversity metrics. In our Fungal analysis tutorial we do this by passing --suppress_align_and_tree during OTU picking, and --non_phylogenetic_diversity for the diversity analyses.

    Next, re: UNITE, I know that a lot of people mention that their taxonomic groups of interest are not represented in UNITE. The developers of UNITE are very interested in community contributions, so if you’re an expert in some group that isn’t in UNITE, you should consider contributing. That’s a lot better than creating your own taxa-specific database, because it makes those annotations accessible to the rest of the community (which means that more studies will publish information on your taxonomic groups of interest!).

  2. May I chip in ITSx (http://microbiology.se/software/itsx/), which is a Swiss army knife for ITS-based mycological research. Even in its simplest form (“perl ITSx –i infile –o outfile –t F”) it will denoise and sharpen your fungal ITS1 or ITS2 datasets beyond what you can accomplish in any other tool I know of.

    While you are correct that “in UNITE … there are still many lineages of fungi yet to be represented”, I’d like to add that UNITE mirrors all ~450,000 more or less fungal full-length ITS sequences in GenBank and the INSDC. So it’s not like we’re excluding lineages from UNITE. But true, at a general level, the ITS-based taxonomic sampling of fungi in the scientific community certainly has holes in it.

    Regarding the “yet to be … named (beyond “uncultured environmental clone”)”, we’re working hard to reduce this number. The odds are against us, because researchers routinely use the “Uncultured fungus” label when they submit their sequences to GenBank, even when they could have used something more informative. And I mean thousands per week. Right now, out of the ~50,000 species hypotheses (OTUs) in UNITE, fewer than 500 contain only sequences without any notion of taxonomic rank. Of course, since UNITE supports web-based third-party sequence annotation, anyone can participate in the annotation effort and rename sequences.
    Something like 60,000+ GenBank fungal ITS sequences have been re-annotated with a better name in UNITE. (These numbers pertain to the less-than-a-week-old seventh release of the UNITE species hypothesis system.)

    We’re in the process of integrating NGS reads in UNITE and the species hypothesis system. This is no walk in the park (due to the high number of phantom sequences in these datasets) but we’re getting there. A couple of large studies are in there already.

    Finally, regarding chimeras, you’ll find reasonably chimera-free fungal ITS (and ITS1 and ITS2) reference datasets for use in UCHIME [and other tools] at the bottom of https://unite.ut.ee/repository.php .

  3. Thanks for chipping in Henrik! UNITE is great. We just wanted to clarify the origins since this post was inspired by someone expressing frustration at getting taxonomic assignments for marine based fungi.

  4. Great post, just curious about one statement: “In our experience when we BLAST these OTUs by hand on GenBank, they are bacteriophage or chimeras.” Do you have any solid evidence that chimeras actually are formed when amplifying/analyzing ITS1 (or ITS2) data only?

    Cheers

  5. Dear Havard,

    I cannot say for sure if any of the sequences that are labeled No Blast Hit are in fact chimeras. I can simply say that I wouldn’t trust them to be good fungal OTUs.

    However, we definitely have evidence that chimeras are formed when amplifying/analyzing ITS1 only data. While it is probably a lot lower level than when amplifying the entire ITS4 (which includes the highly conserved 5.8S region), we do get ~1% chimeras in the chimera checker. Also, I showed Henrik some of my OTUs that I thought seemed dubious and he thought at least one of them looked like a chimera. These were when I was picking OTUs at 97% similarity. The problem seemed to go away however when I picked at 95% similarity.

    Anyone else is welcome to pipe in with their experience with finding chimeras while amplifying only ITS1 or ITS2!

    Best,
    Sydney

  6. This is a great thread and I’m excited to get to share my opinion. Although I’ve recently generated some metagenomic data, I’ve spent the last several years intensely analyzing Sanger-sequenced ITS and other rDNA sequences from two large collections of true fungi and one large collection of plant-associated oomycetes.

    I believe there are two major issues with the ITS as a barcode that are largely overlooked. The first is that there are a large number of extremely important genera of fungi that contain species with identical ITS haplotypes. This is very common in genera such as Cladosporium, Fusarium, Trichoderma, Penicillium, Aspergillus, Ustilago and Phytophthora, but (in my experience) best illustrated within Alternaria, where more than one subgeneric clade share an identical ITS haplotype, which has 1000s of GenBank entries. This issue may not be as important when metagenomic data are limited to familial or generic groupings, but UNITE doesn’t seem to have a solution for this as far as I can tell. TrichoKey (for Trichoderma) has is a good example of a system that does recognize this problem and suggest secondary primers for better resolution, but is outdated.

    The other major issue that is largely unadressed is polymorphic ITS sequences within individual strains. For Sanger sequencing this can cause sequencing artifacts that require cloning steps, and in metagenomic analyses there is a possibility of overestimating OTUs. It also seems that by largely ignoring polymorphic DNA we are missing hybrids and evidence of reticulate evolution.

    1. Thanks for chiming in, Tyler! You bring up good points. Like any barcode, ITS has several issues. Similarly, Chris Hann-Soden from the Taylor lab says that he cannot distinguish between any of the Neurospora species that he studies with ITS. When he does BLAST or other alignments he observes 100% or 99% identity!

      What are you going to use for Phytophthora? Have you designed your own primers or are you going to combine ITS and LSU?

  7. Luckily, the Phytophthora species I’m focusing on are distinguishable by ITS (though in one case only by 2bp, which is 98% similarity). Just like with true fungi, the oomycete D1/D2 LSU is less variable than the ITS (the only clade I’ve encountered where this wasn’t the case were some of the yeasts in the Tremellomycetes). The mitochondrial COI is a slightly better (and more official) barcode for Phytophthora but isn’t as easily amplified. The mitochondrial genome is also less susceptible to intragenomic polymorphisms, which are my biggest problem using ITS for barcoding in Phytophthora – Pythium is even worse. There is a ~2-300 bp spacer between the COII and COI gene that is highly variable but usually intraspecifically conserved.

    I use my own primers for ITS of both oomycetes and true fungi. These are mostly because I prefer longer oligos to suit my preferred PCR and subsequent Sanger reaction conditions. Questions of universal amplifying have not been settled for oomycetes. I was just surprised last week to find that my own set of oomycete ITS primers, which had previously amplified all Phytophthora and Pythium s.l. strains I had thrown at it (including 4/5 clades of Pythium) was totally failing to amplify a set of isolates.

Leave a Reply to Tyler Bourret Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: