From: Jacob Keller
Date: 4 October 2011 21:34
Dear Crystallographers,
I cannot get BLAST to find all proteins with the motif cxcxcxc or at
least cxcxc. It seems to think of "x" as an actual amino acid rather
than a wildcard. There must be some easy way to do this? Ordinarily to
find a short motif, I would just paste the sequence and get the
answer, but here the C's are an absolute requirement and there is no
constraint on the x's except that they be only one residue.
JPK
----------
From: David Briggs
Hi Jacob,
============================
David C. Briggs PhD
Father, Structural Biologist and Sceptic
============================
----------
From: Jacob Keller
Thanks everybody, I tried using
--toolkit tuebingen mpi
--Scanprosite
I think my regex syntax was different from the Tuebingen site's, but
scanprosite worked well and found many hits, although without really
hitting paydirt. I think both of these programs would do the job well,
though.
Thanks very much for your speedy help (this BB is truly amazing!),
Jacob
----------
From: Jacob Keller
Just for kicks, check out this sequence I found in the process
(conjecture: maybe when the virus causes its synthesis, it uses up all
the cysteines/methionines!):
>sp|Q69566|U88_HHV6U Uncharacterized protein U88 OS=Human herpesvirus 6A (strain Uganda-1102) GN=U88 PE=4 SV=1
MYVSVSVHVSVHVSVRVSVRVSVCVSVRVSVHVSVRVSVSVRVSVRVSVSVRVSVRVSVSVHVSVRVSVRVSVSVRVSVCARVCARVCVCARVCVCARVCVCARVCVCARVCARVCVCACVCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCVCVCVCVCVCVCVCVCVCVCLCVCVCLCVCLCVCLCVCVCVCVCLCVCLCVCLCVCVCVCVCLLCMSLCMCMCMCMCMCMCMCMCMSLCMSLCMCMCMCMCMCMCICMCMCICICMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCIIEGNK
Maybe it's just a sequencing glitch?
JPK
----------
From: Jacob Keller
Not so--BLAST showed there are a whole cadre of these things in
various genomes. Go figure.
JPK
Date: 4 October 2011 21:34
Dear Crystallographers,
I cannot get BLAST to find all proteins with the motif cxcxcxc or at
least cxcxc. It seems to think of "x" as an actual amino acid rather
than a wildcard. There must be some easy way to do this? Ordinarily to
find a short motif, I would just paste the sequence and get the
answer, but here the C's are an absolute requirement and there is no
constraint on the x's except that they be only one residue.
JPK
----------
From: David Briggs
Hi Jacob,
SCAN PROSITE
will do precisely what you want.
C-X-C-X-C-X-C
or
C-X-C-X-C
would be the pattern using Prosite syntax.
Cheers,
Dave
============================
David C. Briggs PhD
Father, Structural Biologist and Sceptic
============================
----------
From: Jacob Keller
Thanks everybody, I tried using
--toolkit tuebingen mpi
--Scanprosite
I think my regex syntax was different from the Tuebingen site's, but
scanprosite worked well and found many hits, although without really
hitting paydirt. I think both of these programs would do the job well,
though.
Thanks very much for your speedy help (this BB is truly amazing!),
Jacob
----------
From: Jacob Keller
Just for kicks, check out this sequence I found in the process
(conjecture: maybe when the virus causes its synthesis, it uses up all
the cysteines/methionines!):
>sp|Q69566|U88_HHV6U Uncharacterized protein U88 OS=Human herpesvirus 6A (strain Uganda-1102) GN=U88 PE=4 SV=1
MYVSVSVHVSVHVSVRVSVRVSVCVSVRVSVHVSVRVSVSVRVSVRVSVSVRVSVRVSVSVHVSVRVSVRVSVSVRVSVCARVCARVCVCARVCVCARVCVCARVCVCARVCARVCVCACVCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCACLCVCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCLCVCVCVCVCVCVCVCVCVCVCVCVCLCVCVCLCVCLCVCLCVCVCVCVCLCVCLCVCLCVCVCVCVCLLCMSLCMCMCMCMCMCMCMCMCMSLCMSLCMCMCMCMCMCMCICMCMCICICMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCMCIIEGNK
Maybe it's just a sequencing glitch?
JPK
----------
From: Jacob Keller
Not so--BLAST showed there are a whole cadre of these things in
various genomes. Go figure.
JPK
No comments:
Post a Comment