Saturday 24 March 2012

REFMAC5 residues with bad geometry

From: David Schuller
Date: 24 March 2012 18:39

CCP4 6.2.0
Refmac_5.6.0117
Scientific Linux 6.1

In my current model, I notice that several sidechains are falling apart, despite having gone through a few rounds of refinement with REFMAC5 and model building with COOT. The worst examples were all Glu and Arg residues.

I tried switch to the REFMAC5 executable on the updates page, which was Refmac_5.6.0114, with no obvious difference.

Eventually I noticed that these are all residues containing atoms with occupancy less than 1.00, which must be a carry over from the MR search model. I set all the occupancies to 1.00 and this seems to have fixed the problem.

This seems counter-intuitive to me. If the occupancies are set low, shouldn't the geometry restraints be stronger relative to the density refinement?

Cheers,

ATOM   1479  N   GLU A   7     -51.844 -33.605  37.318  1.00 60.26           N
ATOM   1480  CA  GLU A   7     -53.137 -33.849  37.966  1.00 59.28           C
ATOM   1481  CB  GLU A   7     -52.997 -33.664  39.476  1.00 61.37           C
ATOM   1482  CG  GLU A   7     -52.799 -32.212  39.905  0.48 60.42           C
ATOM   1483  CD  GLU A   7     -53.349 -32.573  41.635  0.00 54.47           C
ATOM   1484  OE1 GLU A   7     -52.557 -31.998  42.106  0.83 52.26           O
ATOM   1485  OE2 GLU A   7     -55.014 -32.911  42.408  0.68 50.75           O
ATOM   1486  C   GLU A   7     -54.293 -32.985  37.412  1.00 62.61           C
ATOM   1487  O   GLU A   7     -55.444 -33.240  37.737  1.00 63.42           O
ATOM   3165  N   ARG A  77     -46.032 -33.003  26.272  1.00 55.82           N
ATOM   3166  CA  ARG A  77     -44.959 -32.368  27.071  1.00 60.92           C
ATOM   3167  CB  ARG A  77     -44.050 -31.428  26.231  1.00 54.56           C
ATOM   3168  CG  ARG A  77     -42.702 -31.102  26.892  1.00 69.21           C
ATOM   3169  CD  ARG A  77     -42.278 -29.628  26.867  0.46 63.93           C
ATOM   3170  NE  ARG A  77     -41.587 -29.303  25.625  0.79 61.76           N
ATOM   3171  CZ  ARG A  77     -41.607 -28.610  24.146  0.00 37.32           C
ATOM   3172  NH1 ARG A  77     -43.177 -26.956  23.467  0.85 60.52           N
ATOM   3173  NH2 ARG A  77     -41.267 -28.245  23.427  0.95 58.82           N
ATOM   3174  C   ARG A  77     -45.585 -31.698  28.281  1.00 64.89           C
ATOM   3175  O   ARG A  77     -45.949 -32.377  29.262  1.00 77.93           O


--
=======================================================================
All Things Serve the Beam



----------
From: David Schuller
On 03/24/12 15:15, Kendall Nettles wrote:
David, how can you justify reducing occupancy of some parts of amino acids?
I don't have to, since I didn't do it. Read again the bit about this being a carryover from the MR model.

Cheers,


 I don't understand this. I can understand deleting stuff that's not there and reporting it as not modelled. This is factually false. The side chains are not there at partial occupancy.
Best regards,
Kendall Nettles

On Mar 24, 2012, at 2:39 PM, "David Schuller"<  wrote:


----------
From: Garib N Murshudov

Hi David

Occupancis of input file are very suspicious and not all atoms of resides are present, some occupancies are zero. In refmac zero occupancy means it does not exist. It may explain the problem
In refinement we could add an option to make occuancies one if there are no alts but it would be dangerous thing to do.

Since occupancies are suspect I would check input pdb file carefully.


regards
Garib




Garib N Murshudov 

----------
From: Eleanor Dodson
As Garib says - an atom with occupancy 0.00 is treated as a marker - useful for coot - but is not included in any X-ray refinement at all.. Maybe it would be more aesthetic to maintain geometry but as crystallographers I think we should be interested in the fit of model to experiment - right? - and not in reporting a pseudo fit related to geometric parameters only..

Eleanor
--
Professor Eleanor Dodson


----------
From: Ed Pozharski
I agree with Eleanor 100%...

In my biased opinion, only the atoms supported by electron density
should be included in deposited models.  To satisfy the "but this will
mess up the electrostatic potential coloring" argument (a valid one, of
course), the "projected model" can be deposited alongside which must be
clearly advertised as the unconstrained interpretation by the
structure's author.

Cheers,

Ed.
--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
                                               Julian, King of Lemurs

----------
From: Gregory Bowman
But what about the issue of resolution? As was previously pointed out, at say 3.2 Å resolution, many side chains will fail to fit, but it doesn't seem appropriate to trim them all down. The users need to also be aware of the quality/resolution of the structures that they are looking at.

Greg




--
Department of Biophysics


----------
From: Eleanor Dodson
 This is a personal preference. I do model at low sigma levels if there IS some indication of where to put atoms, always try to keep the correct sequence even if some atoms are missing, and just for coot convenience keep atoms with occ = 0, rather than delete them altogether. (COOT will refine a residue with occs = 0, but not one where atoms are missing.. Paul?? Why not!)

At deposition the wwwPDB are welcome to (and I think do) strip out all ocs=0..

Eleanor


----------
From: <Herman.Schreuder


I fully agree. Unfortunately, the perfect model does not exist (at least not for protein crystal structures). It is like with Heisenbergs uncertainty principle. Either one has a complete model with a number of atoms having a coordinate uncertainty of 4-6 Å, or one has a model where the uncertainty of all atoms is below say 0.5 Å, but with a lot of truncated side chains with clearly contradict available biochemical evidence.
Cheers,
Herman



----------
From: Ed Pozharski
On Mon, 2012-03-26 at 10:17 -0400, Gregory Bowman wrote:
> But what about the issue of resolution? As was previously pointed out,
> at say 3.2 Å resolution, many side chains will fail to fit, but it
> doesn't seem appropriate to trim them all down.

Why is it inappropriate to trim them down?  Sometimes at low resolution
all one can be confident about is the backbone trace.

Just to be clear, I am talking about atoms whose positions are not
supported by electron density, i.e. where difference map in the absence
of the side chain is featureless.  I assume that is the likely situation
when one would set occupancy to zero.

Cheers,

Ed.

----------
From: Ed Pozharski
On Mon, 2012-03-26 at 16:30 +0200 wrote:
> It is like with Heisenbergs uncertainty principle. Either one has a
> complete model with a number of atoms having a coordinate uncertainty
> of 4-6 Å, or one has a model where the uncertainty of all atoms is
> below say 0.5 Å, but with a lot of truncated side chains with clearly
> contradict available biochemical evidence.

Excellent analogy.  I am not sure why truncated arginine (as long as it
is not renamed to alanine) contradicts biochemical evidence though.
Termini are routinely truncated, no problem there. I have plenty of
biochemical evidence that there are more waters in the crystal than I
model.

If the truncated model contradicts biochemical evidence, the projected
model contradicts crystallographic evidence. I agree that a truncated
model may lead to interpretation problems, and thus the option of
depositing a projected model resolves that.

Cheers,

Ed.

----------
From: <Herman.Schreuder
Dear Ed,

In the end it boils down to personal preferences. With the number of crystal structures I refine each year, I am not going to go over every flexible surface residue to decide whether to truncate the side chain or whether there may be some low level density justifying to keep the side chain, so I opt for the biochemical evidence. For me the added advantage is that I only have a single pdb file to take care of. And again, I see no problem in having a model with some atoms with a larger error bar.

You are right that terminii are often truncated. In contrast to a missing side chain, here we really have no reasonable hypothesis where the missing residues are. They may even have been removed by a protease.

Cheers,
Herman



----------
From: Katherine Sippel
I agree with Herman about personal preference but it also boils to our job as crystallographers to educate non-structural end-users. The fact of the matter is that a lot of researchers use structures without looking at the nuances of the PDB. It's actually pretty common among biologists to download a PDB file and build hypothesis or throw the coordinates into a downstream program like APBS or AutoDock without looking. They don't even realize that density exists or that they should check it, which makes the odds of them reading a header file for missing atoms or understanding the concept of b-factors and occupancy almost nil. Realizing that renders the argument moot until crystallographic data is demystified across the other sciences.

I will say that in principle I do like the idea of a data model + a projected model because it seems like something an end-user could wrap their head around, but in practice this would probably refuel the "what constitutes modelable density" debate all over again.

Cheers,

Katherine

----------
From: James Holton
Try this:

1) take your favorite PDB file and set all the B factors to ~80 (reduces series-termination errors)
2) use sfall/fft in CCP4 to calculate structure factors to 4A resolution
3) use sftools to add a "SIGF" column (0.1 will do) to make refmac5 happy
4) refine the "perfect" model against these fake data for ~5 cycles (with "solvent no")
5) load this up in coot and contour at 1 sigma
6) repeat the refinement with a PDB file containing only main chain.
7) repeat the refinement after putting all the side chains in their most likely (Ponder-Richards) rotamers.

Ask yourself these questions:
1) can you "see" the side chains?
2) can you "see" the waters?
3) what are the R factors from these refinements?

Answers: 1) no, 2) no, 3) ~3% for "perfect", ~50% for "main chain", and ~36% for "likely rotamer"

Now ask yourself: even though there is "no density" for side chains and waters, is there really "no evidence" that they exist?

The point I am trying to make here is that you EXPECT side chains to poke out of density at low resolution, even under ideal conditions (perfect phases).  For example, the C-deltas of Leu will "breach" the 1-sigma contour at around 2.8A resolution and worse.  You can see this in my old movie:
http://bl831.als.lbl.gov/~jamesh/movies/index.html#reso

When it comes to building, yes, once an atom dips below the 1-sigma contour it gets harder and harder to know exactly where it is, but it does have to be somewhere.  Somewhere nearby.  Formally, there is "prior knowledge" of bond lengths, etc. at play.  And if you know that there is one copy of a given atom in every unit cell of the crystal, then occupancy < 1 is inappropriate.  Much better to use B = 999, which models the atom as a Gaussian with the electrons spread over an area about 3.5 A wide.  This is roughly the range your average side chain atom has available to it, given that it is attached to the main chain by covalent bonds.

Of course, a more "Bayesian" model for the "I don't know what the rotamer is" situation would be to build in ALL possible rotamers, with occupancies equal to their Ponder-Richards probabilities.  Some improvement to this initial "guess" would no doubt be made by using constrained occupancy refinement of rigid-body side chains.  Unfortunately, this is impossible with any refinement program I know about, since refmac, phenix.refine, etc. don't support more than 3 or 4 alternate conformers.

Building in all possible conformers and using the occupancy as a "p-value" would also help solve the problem of the careless and/or uneducated over-interpreting PDB files.  Which is the "right one"?  Good question!  I think its time we started dispelling the myth of the single-conformer protein anyway.

-James Holton
MAD Scientist


----------
From: Paul Adams
Hi James,

 my understanding is that phenix.refine allows any number of alternate conformers. There may have been a limit of 4 some time in the past, but no longer. So your idea could be tested.

 Cheers,
       Paul
--
Paul Adams

----------
From: Bernhard Rupp (Hofkristallrat a.D.)
>phenix.refine allows any number of alternate conformers.

Hmm..... quoting our old friends from the validation circuit: Where freedom
is given, liberties will be taken....

BR

----------
From: Ethan Merritt
True, but...

[warning: back of the envelope calculation]

Consider, for example, an isoleucine sidechain.

It would require 12 positional parameters to refine the position of
each sidechain atom (XYZ * {CB CG1 CG2 CD1}) for a single conformation.

One the other hand there are only 7 rotamers in the library.
So if you limited the refinement to the relative occupancy of the
7 ideal rotamers, that is a more parsimonious model than refining
the individual atoms for one conformation.

Season to taste with arguments about geometry restraints, but still
I think the liberty being offered is not so dangerous.

       Ethan

--
Ethan A Merritt


----------
From: Bernhard Rupp (Hofkristallrat a.D.)
> the liberty being offered is not so dangerous

Parameter-wise, true, - if it is understood that you are now *modeling with
little evidence*.
How much electron density to affirm your model choices would one expect at
say 7 ideal rotamers - which on top of static split will likely display
dynamic motion?

On the other hand, the side chain HAS to be somewhere, and a distribution of
the most probable rotamers with their respective frequencies may be more
realistic
than all other options (as long as the lack of specific experimental
evidence is acknowledged).

But overall, that model might be better due to the more reasonable prior
expectation term
although the R-value will be practically unaffected due to the lack of
localized scattering contributions.

Anyhow, if it can be abused, it will be ;-)

Best BR





No comments:

Post a Comment