Saturday 21 January 2012

PHENIX vs REFMAC refinement had me fooled

From: Christopher Browning
Date: 8 December 2011 17:36


Dear All,

Question: Has anybody ever refined the same structure using PHENIX and
then tried REFMAC to see what happens?

I did and I stumbled on something funny. I'm refining a structure at
1.1A resolution which was solved with Iodine phasing using PHENIX
AutoSolve. Got a great map and the structure was built almost
completely. I had to build a few residues myself, and using the
published sequence, I started filling in the residues, but as I came
nearer the N-terminus, it looked like the density did not match residues
from the sequence. I kept the residues as in the sequence, but as you
can see from the PHENIX refined picture (below is the link) it still
looks like the amino acid sequence in the crystal does not match the
published protein sequence.

Out of interest I refined the same file in REFMAC, and now the electron
density is correct, and the sequence of the amino acids in the crystal
matches the published sequence (see link for picture below). Not only
that..... my R/Rfree improved (16.5/19 for PHENIX, 10/18 for REFMAC).

I've also refined the occupancies of the iodide, however the the output
FO-FC map from PHENIX complains and the REFMAC map is fine.....

How can this be and what causes this?

Link for the pictures:
Both maps are at identical Sigma levels in both pictures.
PHENIX: http://dl.dropbox.com/u/51868657/PHENIX_refined.png
REFMAC: http://dl.dropbox.com/u/51868657/REFMAC_refined.png

Cheers,

Chris Browning



--
Dr. Christopher Browning

----------
From: Pavel Afonine


Christopher,

if you send me the input PDB and data files (off-list, of course) I will have a close look and then explain what exactly happens and why. I also promise to post the summary on this mailing list (without revealing the identity of your structure, of course).
If you send me the command you used to run PHENIX, I will comment on that too.

Please send the files to my email address: PAfonine@lbl.gov

Pavel

----------
From: Tim Gruene 


Dear Christopher,

if your R/Rfree from Refmac5 really are 10% vs. 18%, you might simply be
looking at an electron density map with strong model bias, i.e. the map
shows the features of the model and not of the data. Although at 1.1A
resolution this seems quite unlikely, but that's what might explain this
great gap between R and Rfree.

Tim
- --
Dr Tim Gruene

----------
From: Mark J van Raaij


are you sure that you are using the original input intensities for the REFMAC map calculation (and the refinement run) and not the output ones of the (previous) run?
if you are not, you may have model bias, and Rfree can be fooled, especially with NCS (if you have it).

- or perhaps anisotropic B-factor refinement (if you are using it), works better in REFMAC than PHENIX?

Mark J van Raaij
Laboratorio M-4

----------
From: Jonathan Elegheert



In addition to the remarks of Mark and Tim, could you make sure that you are refining in Refmac with the correct flag for the Rfree set (value = 0)? In Phenix, this is the opposite: the value is 1 for test reflections and 0 for work reflections. So going from a PHENIX environment to Refmac, you might well be refining against your small fraction of test reflections. I have seen this give spurious density features (obviously) and serious model bias.

Cheers
Jonathan
Ghent University

----------
From: Petr Leiman


Dear Tim,

I agree with you completely. The question then becomes why does the automatic weighting scheme in refmac allow R and R-free to run away from each other by 8% in a 1.1 A resolution structure?

Petr

----------
From: Katherine Sippel


In a non-computational capacity would also suggest perhaps resequencing your clone. Occasionally the published sequences are off, the specific base is polymorphous or there is also the possibility that you introduced a mutation somewhere. That would be the cheap and easy way to definitively answer the question.

Cheers,
Katherine

----------
From: Boaz Shaanan


Hi Petr,

The automatic weighting scheme in the recent Refmac version that comes with 6.2.0 is fine but there is a limit to what it can do if something is seriously wrong with the model/data/whatever. It has just worked well for me in all respects (Rw/Rf gap, maps quality, FOM, ligands, all-the-validation-parameters-you-can-think-of ) on 2.2 and 3 A resolution structures. In earlier versions (I can't recall how far back) I remember having to disable the automatic weighting and use my own values. There must be something else there in the refmac run.

          Boaz


Boaz Shaanan, Ph.D.





----------
From: Jens Kaiser


My money is on the the wrong test set (as Jonathan Elegheert suggested).
I have seen this several times with newbies, when the test set is
created by phenix. It does it the "xplor-way". When it comes to the free
set, refmac defaults to 0, phenix tries to be intelligent (i.e. if 1/0
it uses 1, if more 0/1/2... it uses 0). Additionally, refmac (and I
think phenix) produces Fc filled maps. So if you swap R/freeR
reflections, the maps always look spectacular, as they essentially are
Fcalc maps.
Inspection of the logfiles should help: #of reflections free and #of
reflections for refinement are reported by both programs, and IIRC, you
should get a warning that your free-R-set is not sensible.
One way out is to /always/ use ccp4 to assign the test-set, then both
programs run fine. Otherwise you have to explicitly tell refmac to use
"1" as the test reflections.

HTH,

Jens

----------
From: Garib N Murshudov


Check your Rfree flag. Phenix and refmac may use different flag. Have a look into log file. If percentage of Rfree reflections is 95% then flags need to be swapped.

Garib
Garib N Murshudov 




----------
From: Tim Gruene


Dear Petr,

there may be a couple of reasons, e.g.
- - your data are not really 1.1A, but you simply integrated a lot of noise
- - you entered some odd command or another option which allows refmac5
such a deviation
- - your model is incomplete
- - surely several more.
When a well tested program does something unexpected, it is usually the
user and not the program which misbehaves...

The optimisation of the weighting scheme is based on the geometry of the
model and not one the gap between R and Rfree.

Cheers, Tim


----------
From: Christopher Browning


Hi Everybody,

First off, thanks for the replies. They definitely fixed my problem. It
was indeed as Garib Murshudov said. The flags got swapped, and therefore
the percentage of Rfree reflections were 95%.

So, if Rfree is created from CCP4.....use Rfree flags with a value of 0
and a value of 1 if the Rfree was created with PHENIX.

Both maps now look the same, and it indeed looks like the protein in the
crystal is different from the sequence.

Cheers,

Chris

----------
From: Pavel Afonine


Hi Christopher,

just a remark: for phenix.refine it does not matter where the flags come from and what is the "test"/"work" value since it automatically scores the values in the flags array and guesses the right one. Still one can imagine corner case, so it's good to be careful -:)

Pavel

----------
From: Ed Pozharski

Since it does not matter to phenix.refine and it will remain backward
compatible, how about changing the default behavior so that when the
test set is missing, it is created with test_value=0?  Unless the
test_value=1 expectation is hard-coded somewhere else, this seems like
an easy fix, and will prevent the problems Chris was having.

I always thought that test_value=1 is essentially inherited from the CNS
default.  But when you think about it, the way it's done in
refmac/freeflag makes much more sense because of:

a) tiny improvement in code readability, since bool(0)=False (a very
python-esque argument);
b) if one wishes to use a different test set, 1/fraction of them are
already generated.

Cheers,

Ed.

--
After much deep and profound brain things inside my head,
I have decided to thank you for bringing peace to our home.
                                   Julian, King of Lemurs

----------
From: Pavel Afonine


Hi Ed,

changes like this generate more confusion then good, I guess. Current phenix.refine behavior does not create any problem for phenix.refine users, so I don't feel a strong reason for changing anything. It's not just a flipping the flag value somewhere, but it's updating the documentation, replying a whole lot of emails offline subjected "why is this", etc etc.. On the other hand, I would rather suggest making the other programs automatically recognize the right flag in most cases - it is a trivial coding exercise that any developer can do within an hour, and does not require changing habits.

Pavel

----------
From: Yuri Pompeu


PHENIX has an otpion under the reflection editor program that will create R flags that are compatible with ccp4 programs.
Another point worth mentioning is in phenix.refine it is appropriate to use the data.mtz files generated each round of refinement, as these are the raw data plus the Rfree flags. In refmac however the newly generated refmacX.mtz file contains phase info as PHIC calculated from your model. Using this for subsequent rounds of refinement results in terrific looking maps as they are now biased (even more so) by the input model.

----------
From: Tim Gruene


Dear Yuri et al.,

I like the fact that one must not use the output mtz-file from refmac as
input to the next round of refinement. It encourages to think about why
this is and then makes you realise what your data really are: the result
of data processing and that the coordinate file is only a model which
one tries to make as consistent as possible with the data.

Tim

----------
From: Ed Pozharski


PHIC won't be used in refinement unless you specify it, so this can't
happen if you just use the refmac output mtz.  AFAIU, the reason the
output mtz should never be used in subsequent refinement is because the
Fo's are modified.  I never understood why one would even have an idea
of using the output mtz as a new input.  Maybe you can explain this to
me at last.

Cheers,

Ed.

--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
                                               Julian, King of Lemurs

----------
From: Yuri


Precisely, one should not use it! I have seen people do it either because they dont fully understand what is going on or are not at all familiar with the documentation.
In phenix the output .mtz contains Fo plus x% Rfree flag=1, so one may try and do this for refmac because of one of the two above mentioned reasons (or both)
Yuri Pompeu

----------
From: Eleanor Dodson


This is very uncommon...

Can you look at the final plot of R and Rfree against resolution.
Sometimes there is an awful hiccup somewhere -
ice rings?
high resolution data somewhat fictional -
low resolution data saturated and also somewhat fictional ..

I also check the completeness which is uin the same loggraph panel.

Eleanor

----------
From: Yuri Pompeu


Hi Ed,
I just had a chance of looking at your comment more closely.
You are right it only uses PHIC if in refmacs set up you choose to refine "with prior phase information" -AFAIU.
So what exactly is the info contained in the output refmacX.mtz besides map coefficients for COOT? If it is not just the raw xray data Fo, is it Fc only, or Fc that are filled in for missing Fo?
I guess I am not really sure. I was under the impression that model´s PHIC would cause the problems, if they were present.
Best,

----------
From: Eleanor Dodson


PHIC wont do any harm - you may need it for various reasons - I use it mostly as input for a DANO map to check out possible metal sites..

The reason for not using a refmac output mtz as input for the next run is

1) the refmac output  Fobs has been scaled by the anisotropic scale so all subsequent scaling as you improve the model wold be relative..

2) If there is any twinning detected the output "Fobs" have been corrected for this twin factor.

3) Obviously if you asked for refinement with phase restraints it would very unwise to set your input phase to the existing PHIC. However I think this is well nigh impossible without relabelling PHIC to some other column label - the mtz utilities would scream that you have an input label and an output label the same..

Eleanor

----------
From: Ed Pozharski


The columns in a standard refmac output mtz file are

H K L FreeR_flag - self-evident

F SIGF - these are modified compared to the input.  AFAIU, some of the
scaling is applied to the Fo's as a matter of programming elegance.
Naturally, this makes using them for future refinement cycles
problematic.

FC PHIC - these are Fc's from the atoms present in the model

FC_ALL PHIC_ALL  - full Fc's, i.e including the solvent contribution

FWT PHWT DELFWT PHDELWT - this is what you called the "map coefficients
for COOT", although this is historically incorrect since refmac produced
this output before coot was born

FOM - figure-of-merit

As for filling-in missing reflections, it is always on.  Not to rekindle
another Hegelian fire, but the idea is that the missing reflections
should always be filled in because Fc's are definitely better estimates
of Fo's than zeros.

HTH,

Ed.

--
"I'd jump in myself, if I weren't so good at whistling."
                              Julian, King of Lemurs

----------
From: James Holton


A small but potentially important correction:

FC_ALL PHIC_ALL from REFMAC are indeed the calculated structure factor of the coordinates+bulk_solvent, but AFTER multiplying by the likelihood coefficient "D" (as in 2*m*Fo-D*Fc ).  So, if you subtract ( FC_ALL PHIC_ALL ) from ( FC PHIC ) you will NOT get the bulk solvent contribution alone.  AFAIK there is no way to obtain just the bulk solvent contribution from REFMAC.

-James Holton
MAD Scientist

----------
From: Ian Tickle


James, I agree completely!  But I would venture to go further and say
that the FC/PHIC values really have no business being in the output
MTZ file in the first place, so if they weren't there then the
question of subtracting them would never arise.  They are the result
of intermediate calculations, the kind of things I print out when I'm
debugging a program to aid in checking the logic.  The FC_ALL/PHIC_ALL
values represent the final definitive result of the refinement, so in
all applications where Fcalc is required (e.g. density correlation
stats) DFc/phi(DFc) should always be used - and why would one want to
omit part of the model anyway (unless maybe for an omit map - but that
doesn't seem to be relevant here)?

Fc/phic is the transform of the refined atomic model parameters as
output in XYZOUT which essentially is just a snapshot of the model.
DFc/phi(DFc) represents the transform of an ensemble average of a
distribution of models generated by the random co-ordinate (and other
parameter) errors, and of course everyone knows that X-ray diffraction
measures the ensemble average, not a shapshot.

Also we know that (2mFo-DFc)/phic (or mFo/phic if centric) is the best
estimate of the true phased F.  The best estimate of the difference
Ftrue-Fmodel is the difference coefficient 2(mFo-DFc)/phic (or
mFo-DFc)/phic if centric).  So the best estimate of Fmodel is clearly
(Ftrue - (Ftrue - Fmodel))/phic = (2mFo - DFc - 2(mFo - DFc))/phic =
DFc/phic (and the same result for centric).

Cheers

-- Ian


No comments:

Post a Comment