Friday 21 October 2011

Finding a sequence motif with BLAST

From: Jacob Keller
Date: 4 October 2011 21:34


Dear Crystallographers,

I cannot get BLAST to find all proteins with the motif cxcxcxc or at
least cxcxc. It seems to think of "x" as an actual amino acid rather
than a wildcard. There must be some easy way to do this? Ordinarily to
find a short motif, I would just paste the sequence and get the
answer, but here the C's are an absolute requirement and there is no
constraint on the x's except that they be only one residue.

JPK



----------
From: David Briggs


Hi Jacob,

SCAN PROSITE


will do precisely what you want. 

C-X-C-X-C-X-C

or 

C-X-C-X-C

would be the pattern using Prosite syntax.

Cheers,

Dave

============================
David C. Briggs PhD
Father, Structural Biologist and Sceptic
============================

----------
From: Jacob Keller


Thanks everybody, I tried using

--toolkit tuebingen mpi
--Scanprosite

I think my regex syntax was different from the Tuebingen site's, but
scanprosite worked well and found many hits, although without really
hitting paydirt. I think both of these programs would do the job well,
though.

Thanks very much for your speedy help (this BB is truly amazing!),

Jacob

----------
From: Jacob Keller


Just for kicks, check out this sequence I found in the process
(conjecture: maybe when the virus causes its synthesis, it uses up all
the cysteines/methionines!):

>sp|Q69566|U88_HHV6U Uncharacterized protein U88 OS=Human herpesvirus 6A (strain Uganda-1102) GN=U88 PE=4 SV=1
MYVSVSVHVSVHVSVRVSVRVSVCVSVRVSVHVSVRVSVSVRVSVRVSVSVRVSVRVSVSVHVSVRVSVRVSVSVRVSVCARVCARVCVCARVCVCARVCVCARVCVCARVCARVCVCACVCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCVCVCVCVCVCVCVCVCVCVCLCVCVCLCVCLCVCLCVCVCVCVCLCVCLCVCLCVCVCVCVCLLCMSLCMCMCMCMCMCMCMCMCMSLCMSLCMCMCMCMCMCMCICMCMCICICMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCIIEGNK

Maybe it's just a sequencing glitch?

JPK

----------
From: Jacob Keller


Not so--BLAST showed there are a whole cadre of these things in
various genomes. Go figure.

JPK


No comments:

Post a Comment