Tuesday, 27 March 2012

a question about protein sequences in the PDB

From: Francois Berenger
Date: 27 March 2012 03:54

Dear list,

If I take all the fasta files for proteins in the PDB,
are the sequences complete?

I mean, do they have holes sometimes (missing amino acids)?

Sorry for the maybe stupid question but I know that sometimes
the PDB files have missing residues, I am hoping that
it is not the case with the FASTA files.

Regards,
Francois.

----------
From: Bosch, Juergen
I think that depends on what the depositor considered complete.
Just as an example the construct you cloned say from residue 20 - 380 would you consider that complete or would you consider only the sequence complete if it contained the first 20residues ?
Regarding the gaps in terms of missing residues in the structure because they were not observed you shouldn't be worried as they are included in the FASTA sequence.
3TGH just as an example, would you consider that complete ?

Jürgen

......................
Jürgen Bosch




----------
From: Ethan Merritt
On Monday, 26 March 2012, Francois Berenger wrote:
> Dear list,
>
> If I take all the fasta files for proteins in the PDB,
> are the sequences complete?
>
> I mean, do they have holes sometimes (missing amino acids)?

In theory the SEQRES records describe the sequence of the
entity that was crystallized, whether or not it is all visible
in the electron density or present in the deposited model.
So normally there should not be any "missing" internal
residues.  But if the expression construct was a not the full
gene sequence, e.g. an N-terminal truncation, then those
N- or C- terminal residues (or whole domains) will not be
listed.

So goes the theory. There are always corner cases.
I remember having a dispute with the PDB long ago about
whether a peptide chain that was known to have undergone
loop cleavage was properly described with a single
chain identifier or with two chain identifiers.  And if the
cleavage involved excission of one or more residues, would
they appear in the SEQRES records anyhow?


> Sorry for the maybe stupid question but I know that sometimes
> the PDB files have missing residues, I am hoping that
> it is not the case with the FASTA files.

I was assuming that the FASTA files you refer to are just
conversions of the SEQRES records.  If not, then all bets are
off.  If the FASTA files are retrieved by gene ID from Uniprot
or some other sequence data base, then they will be complete in
one sense but may not perfectly match what was in the deposited
crystal structure due to cloning artifacts, strain variation,
allelic non-uniformity, etc.

       Ethan

> Regards,
> Francois.
>

----------
From: Francois Berenger
OK, thanks for the answers.
I'll try to find out more about the FASTA files present in the database then.

Regards,
F.

       Ethan

Regards,
Francois.



----------
From: Chad Simmons
The total model that fits the observable electron density should be the standard for the PDB FASTA file with the deposited structure factors, however, not all depositions contain a link to the expression construct details because many are not published, and I believe that it should be explicitly detailed in the submission.

Chad







No comments:

Post a Comment