CCP4 Bulletin Board Archive: REFMAC5 residues with bad geometry

From: David Schuller
Date: 24 March 2012 18:39
CCP4 6.2.0
Refmac_5.6.0117
Scientific Linux 6.1

In my current model, I notice that several sidechains are falling apart, despite having gone through a few rounds of refinement with REFMAC5 and model building with COOT. The worst examples were all Glu and Arg residues.

I tried switch to the REFMAC5 executable on the updates page, which was Refmac_5.6.0114, with no obvious difference.

Eventually I noticed that these are all residues containing atoms with occupancy less than 1.00, which must be a carry over from the MR search model. I set all the occupancies to 1.00 and this seems to have fixed the problem.

This seems counter-intuitive to me. If the occupancies are set low, shouldn't the geometry restraints be stronger relative to the density refinement?

Cheers,

ATOM 1479 N GLU A 7 -51.844 -33.605 37.318 1.00 60.26 N
ATOM 1480 CA GLU A 7 -53.137 -33.849 37.966 1.00 59.28 C
ATOM 1481 CB GLU A 7 -52.997 -33.664 39.476 1.00 61.37 C
ATOM 1482 CG GLU A 7 -52.799 -32.212 39.905 0.48 60.42 C
ATOM 1483 CD GLU A 7 -53.349 -32.573 41.635 0.00 54.47 C
ATOM 1484 OE1 GLU A 7 -52.557 -31.998 42.106 0.83 52.26 O
ATOM 1485 OE2 GLU A 7 -55.014 -32.911 42.408 0.68 50.75 O
ATOM 1486 C GLU A 7 -54.293 -32.985 37.412 1.00 62.61 C
ATOM 1487 O GLU A 7 -55.444 -33.240 37.737 1.00 63.42 O
ATOM 3165 N ARG A 77 -46.032 -33.003 26.272 1.00 55.82 N
ATOM 3166 CA ARG A 77 -44.959 -32.368 27.071 1.00 60.92 C
ATOM 3167 CB ARG A 77 -44.050 -31.428 26.231 1.00 54.56 C
ATOM 3168 CG ARG A 77 -42.702 -31.102 26.892 1.00 69.21 C
ATOM 3169 CD ARG A 77 -42.278 -29.628 26.867 0.46 63.93 C
ATOM 3170 NE ARG A 77 -41.587 -29.303 25.625 0.79 61.76 N
ATOM 3171 CZ ARG A 77 -41.607 -28.610 24.146 0.00 37.32 C
ATOM 3172 NH1 ARG A 77 -43.177 -26.956 23.467 0.85 60.52 N
ATOM 3173 NH2 ARG A 77 -41.267 -28.245 23.427 0.95 58.82 N
ATOM 3174 C ARG A 77 -45.585 -31.698 28.281 1.00 64.89 C
ATOM 3175 O ARG A 77 -45.949 -32.377 29.262 1.00 77.93 O

--
=======================================================================
All Things Serve the Beam

----------
From: David Schuller
On 03/24/12 15:15, Kendall Nettles wrote:

David, how can you justify reducing occupancy of some parts of amino acids?

I don't have to, since I didn't do it. Read again the bit about this being a carryover from the MR model.

Cheers,

I don't understand this. I can understand deleting stuff that's not there and reporting it as not modelled. This is factually false. The side chains are not there at partial occupancy.
Best regards,
Kendall Nettles

On Mar 24, 2012, at 2:39 PM, "David Schuller"< wrote:

----------
From: Garib N Murshudov

Hi David

Occupancis of input file are very suspicious and not all atoms of resides are present, some occupancies are zero. In refmac zero occupancy means it does not exist. It may explain the problem

In refinement we could add an option to make occuancies one if there are no alts but it would be dangerous thing to do.

Since occupancies are suspect I would check input pdb file carefully.

regards

Garib

Garib N Murshudov

----------
From: Eleanor Dodson
As Garib says - an atom with occupancy 0.00 is treated as a marker - useful for coot - but is not included in any X-ray refinement at all.. Maybe it would be more aesthetic to maintain geometry but as crystallographers I think we should be interested in the fit of model to experiment - right? - and not in reporting a pseudo fit related to geometric parameters only..

Eleanor

--
Professor Eleanor Dodson

----------
From: Ed Pozharski
I agree with Eleanor 100%...

In my biased opinion, only the atoms supported by electron density
should be included in deposited models. To satisfy the "but this will
mess up the electrostatic potential coloring" argument (a valid one, of
course), the "projected model" can be deposited alongside which must be
clearly advertised as the unconstrained interpretation by the
structure's author.

Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs

----------
From: Gregory Bowman

But what about the issue of resolution? As was previously pointed out, at say 3.2 Å resolution, many side chains will fail to fit, but it doesn't seem appropriate to trim them all down. The users need to also be aware of the quality/resolution of the structures that they are looking at.

Greg

--
Department of Biophysics

----------
From: Eleanor Dodson
This is a personal preference. I do model at low sigma levels if there IS some indication of where to put atoms, always try to keep the correct sequence even if some atoms are missing, and just for coot convenience keep atoms with occ = 0, rather than delete them altogether. (COOT will refine a residue with occs = 0, but not one where atoms are missing.. Paul?? Why not!)

At deposition the wwwPDB are welcome to (and I think do) strip out all ocs=0..

Eleanor

----------
From: <Herman.Schreuder

I fully agree. Unfortunately, the perfect model does not exist (at least not for protein crystal structures). It is like with Heisenbergs uncertainty principle. Either one has a complete model with a number of atoms having a coordinate uncertainty of 4-6 Å, or one has a model where the uncertainty of all atoms is below say 0.5 Å, but with a lot of truncated side chains with clearly contradict available biochemical evidence.

Cheers,

Herman

----------
From: Ed Pozharski

On Mon, 2012-03-26 at 10:17 -0400, Gregory Bowman wrote:
> But what about the issue of resolution? As was previously pointed out,
> at say 3.2 Å resolution, many side chains will fail to fit, but it
> doesn't seem appropriate to trim them all down.

Why is it inappropriate to trim them down? Sometimes at low resolution
all one can be confident about is the backbone trace.

Just to be clear, I am talking about atoms whose positions are not
supported by electron density, i.e. where difference map in the absence
of the side chain is featureless. I assume that is the likely situation
when one would set occupancy to zero.

Cheers,

Ed.

----------
From: Ed Pozharski

On Mon, 2012-03-26 at 16:30 +0200 wrote:
> It is like with Heisenbergs uncertainty principle. Either one has a
> complete model with a number of atoms having a coordinate uncertainty
> of 4-6 Å, or one has a model where the uncertainty of all atoms is
> below say 0.5 Å, but with a lot of truncated side chains with clearly
> contradict available biochemical evidence.

Excellent analogy. I am not sure why truncated arginine (as long as it
is not renamed to alanine) contradicts biochemical evidence though.
Termini are routinely truncated, no problem there. I have plenty of
biochemical evidence that there are more waters in the crystal than I
model.

If the truncated model contradicts biochemical evidence, the projected
model contradicts crystallographic evidence. I agree that a truncated
model may lead to interpretation problems, and thus the option of
depositing a projected model resolves that.

Cheers,

Ed.

----------
From: <Herman.Schreuder
Dear Ed,

In the end it boils down to personal preferences. With the number of crystal structures I refine each year, I am not going to go over every flexible surface residue to decide whether to truncate the side chain or whether there may be some low level density justifying to keep the side chain, so I opt for the biochemical evidence. For me the added advantage is that I only have a single pdb file to take care of. And again, I see no problem in having a model with some atoms with a larger error bar.

You are right that terminii are often truncated. In contrast to a missing side chain, here we really have no reasonable hypothesis where the missing residues are. They may even have been removed by a protease.

Cheers,
Herman

----------
From: Katherine Sippel
I agree with Herman about personal preference but it also boils to our job as crystallographers to educate non-structural end-users. The fact of the matter is that a lot of researchers use structures without looking at the nuances of the PDB. It's actually pretty common among biologists to download a PDB file and build hypothesis or throw the coordinates into a downstream program like APBS or AutoDock without looking. They don't even realize that density exists or that they should check it, which makes the odds of them reading a header file for missing atoms or understanding the concept of b-factors and occupancy almost nil. Realizing that renders the argument moot until crystallographic data is demystified across the other sciences.

I will say that in principle I do like the idea of a data model + a projected model because it seems like something an end-user could wrap their head around, but in practice this would probably refuel the "what constitutes modelable density" debate all over again.

Cheers,

Katherine

----------
From: James Holton
Try this:

1) take your favorite PDB file and set all the B factors to ~80 (reduces series-termination errors)
2) use sfall/fft in CCP4 to calculate structure factors to 4A resolution
3) use sftools to add a "SIGF" column (0.1 will do) to make refmac5 happy
4) refine the "perfect" model against these fake data for ~5 cycles (with "solvent no")
5) load this up in coot and contour at 1 sigma
6) repeat the refinement with a PDB file containing only main chain.
7) repeat the refinement after putting all the side chains in their most likely (Ponder-Richards) rotamers.

Ask yourself these questions:
1) can you "see" the side chains?
2) can you "see" the waters?
3) what are the R factors from these refinements?

Answers: 1) no, 2) no, 3) ~3% for "perfect", ~50% for "main chain", and ~36% for "likely rotamer"

Now ask yourself: even though there is "no density" for side chains and waters, is there really "no evidence" that they exist?

The point I am trying to make here is that you EXPECT side chains to poke out of density at low resolution, even under ideal conditions (perfect phases). For example, the C-deltas of Leu will "breach" the 1-sigma contour at around 2.8A resolution and worse. You can see this in my old movie:
http://bl831.als.lbl.gov/~jamesh/movies/index.html#reso

When it comes to building, yes, once an atom dips below the 1-sigma contour it gets harder and harder to know exactly where it is, but it does have to be somewhere. Somewhere nearby. Formally, there is "prior knowledge" of bond lengths, etc. at play. And if you know that there is one copy of a given atom in every unit cell of the crystal, then occupancy < 1 is inappropriate. Much better to use B = 999, which models the atom as a Gaussian with the electrons spread over an area about 3.5 A wide. This is roughly the range your average side chain atom has available to it, given that it is attached to the main chain by covalent bonds.

Of course, a more "Bayesian" model for the "I don't know what the rotamer is" situation would be to build in ALL possible rotamers, with occupancies equal to their Ponder-Richards probabilities. Some improvement to this initial "guess" would no doubt be made by using constrained occupancy refinement of rigid-body side chains. Unfortunately, this is impossible with any refinement program I know about, since refmac, phenix.refine, etc. don't support more than 3 or 4 alternate conformers.

Building in all possible conformers and using the occupancy as a "p-value" would also help solve the problem of the careless and/or uneducated over-interpreting PDB files. Which is the "right one"? Good question! I think its time we started dispelling the myth of the single-conformer protein anyway.

-James Holton
MAD Scientist

----------
From: Paul Adams
Hi James,

my understanding is that phenix.refine allows any number of alternate conformers. There may have been a limit of 4 some time in the past, but no longer. So your idea could be tested.

Cheers,
Paul

--
Paul Adams

----------
From: Bernhard Rupp (Hofkristallrat a.D.)

>phenix.refine allows any number of alternate conformers.

Hmm..... quoting our old friends from the validation circuit: Where freedom
is given, liberties will be taken....

BR

----------
From: Ethan Merritt

True, but...

[warning: back of the envelope calculation]

Consider, for example, an isoleucine sidechain.

It would require 12 positional parameters to refine the position of
each sidechain atom (XYZ * {CB CG1 CG2 CD1}) for a single conformation.

One the other hand there are only 7 rotamers in the library.
So if you limited the refinement to the relative occupancy of the
7 ideal rotamers, that is a more parsimonious model than refining
the individual atoms for one conformation.

Season to taste with arguments about geometry restraints, but still
I think the liberty being offered is not so dangerous.

Ethan

--
Ethan A Merritt

----------
From: Bernhard Rupp (Hofkristallrat a.D.)

> the liberty being offered is not so dangerous

Parameter-wise, true, - if it is understood that you are now *modeling with
little evidence*.
How much electron density to affirm your model choices would one expect at
say 7 ideal rotamers - which on top of static split will likely display
dynamic motion?

On the other hand, the side chain HAS to be somewhere, and a distribution of
the most probable rotamers with their respective frequencies may be more
realistic
than all other options (as long as the lack of specific experimental
evidence is acknowledged).

But overall, that model might be better due to the more reasonable prior
expectation term
although the R-value will be practically unaffected due to the lack of
localized scattering contributions.

Anyhow, if it can be abused, it will be ;-)

Best BR

CCP4 Bulletin Board Archive

Saturday, 24 March 2012

REFMAC5 residues with bad geometry

No comments:

Post a Comment

Followers