microBEnet: the microbiology of the Built Environment network

home Bioinformatics Request for help on sequence conservation among > 1 million repeats

Request for help on sequence conservation among > 1 million repeats

By Jonathan Eisen Posted in Bioinformatics

Posted on March 20, 2017

I got an email from a colleague requesting some help on an informatics issue and I thought it might be useful to post it here. I have been thinking about starting a section of this blog on “Technical Help Requests” or something like that so I guess this is a test.

Here is the request

I have a list of sequences of a set of > 1 million short repeat elements in a large eukaryotic genome, and I need to find a ~60 bp region which is most conserved among these elements. While they are “repeat elements”, they can be fairly diverse in specific sequence, but I only need a subset that contain the (near-perfect) conserved sequence. What method or software would you recommend to find this region? All the ones I usually use can’t handle that many lines of input.

6 thoughts on “Request for help on sequence conservation among > 1 million repeats”

Tattooed Science says:

March 20, 2017 at 12:31 pm

Is the 60bp sequence known or is this a find the 60bp conserved region?

Reply
Jonathan Eisen says:

March 20, 2017 at 12:54 pm

It is basically known (at least, representatives of the repeat are known)

Reply
Elinne Becket says:

March 20, 2017 at 1:53 pm

To find a >60bp conserved region (or the most conserved region) among a known set of repeat elements. The conserved region is undefined. Thanks!

Reply
Tattooed Science says:

March 20, 2017 at 3:07 pm

If you have representative sequences then something like kallisto (https://pachterlab.github.io/kallisto/about.html) or other psedualignment software would work well. Most pseudoalignment software scales to the millions pretty easily.

We are currently using kallisto to sift through millions of transcriptomes for conserved fragments.

Reply
Elinne Becket says:

March 20, 2017 at 4:26 pm

Thanks! We’re taking a look into kallisto now for our seq list. We also just found an R package “DECIPHER” that seems pretty powerful.

Reply
Tattooed Science says:

March 20, 2017 at 4:30 pm

You are welcome! vsearch (https://github.com/torognes/vsearch) might also be useful but I don’t know how well it scales.

Reply

Request for help on sequence conservation among > 1 million repeats

Like this:

Related

6 thoughts on “Request for help on sequence conservation among > 1 million repeats”

Leave a Reply Cancel reply