Posted a request for help on Twitter but I thought it might be of interest to some other people so am also posting about it here.
Basically what we (me and Lizzy Wilbanks in my lab) is to take some microbial genomes and to fully annotate all of the repeat elements and all of the possible mobile genetic elements in the genome. There are many reasons we want to do this but what we are struggling with right now is good informatics tools for doing it.
So if anyone has any suggestions for good computational methods for doing this it would be appreciated.
Below I have posted a “Storify” of the discussion in response to my question that has occurred on Twitter.
We worked on MSc project with IS elements in E.coli and the work extended to Shigella but not Salmonella. We formulated a pipeline that could be extended to other organisms. Let us know if that sounds like the sort of thing you are after?!
Thanks. Is this something we could run or would we have to give you the sequences?
I’m currently characterizing what looks like to be a MITE, and have run through most of the available programs. It seems like simple sequence repeats as well as autonomous transposable elements are relatively easy to find, but non-identical/degenerate repeats as well as non-autonomous elements require more manual labor.
Try ‘tandem repeats database’ (BU) and ‘MUST’ for MITE.
Given that most of these repeats/elements are species or even strain specific, there must be a way to combine already available detection methods of SSRs and ISs with comparative whole genome homology BLASTs; if interspersed regions lack homology to other species then mark them as potential repeats.