Monday 30 April 2012

Best method for weighted averaging of Friedel pairs?

From: Markus Meier
Date: 10 February 2012 19:47


Dear all,
I have a anomalous dataset, processed in HKL2000. Scalepack outputs a
file containing the separately merged sets of the Friedel pairs I- and
I+ and their standard deviations sigI+ and sigI-. Scalepack does not
output the averaged intensities (Imean) and the standard deviations
(sigIMean).

The CCP4 program truncate that I use to convert the intensities to
amplitudes requires Imean, I- and I+ and the respective standard
deviations in its input file.

I have now found at least three different methods to generate the
averaged intensities from the Friedel pairs:

1) scalepack2mtz

  uses standard deviations for the weights:
  weights w = 1/sigI

  Imean = (w+*I+ + w-*I- ) / (w+ + w-)
  sigImean = 1 / (w+ + w-)

2) Method described in Biomolecular crystallography by Bernhard Rupp, p.
332/333
  to average symmetry equivalent reflections

  uses variances for the weights:
  weight w = 1/sigI^2

  Imean = (w+*I+ + w-*I- ) / (w+ + w-)
  sigImean = 1 / sqrt(w+ + w-)

3) Method used in cctbx
  function miller.set.average_bijvoet_mates() that calls generic
merge.merge_equivalent_obs():

  same as methods 2, except that

  sigImean is the larger of either
    a) sigImean = 1 / sqrt(w+ + w-)
    or
    b) sigImean = sqrt( wvariance )

  where wvariance =
    (w+ + w-) / [ (w+ + w-)^2 - (w+^2 + w-^2) ] *
    [ w+*(F+ - Imean)^2 + w-*(F- - Imean)^2 ]

What are the advantages and disadvantages of each method? Should method
1 be used at all?

Some example from my dataset:
Reflection (1, 1, 0), space group P3 2 1

I+: 23841.50 sigI+: 634.01 I-: 9628.57, sigI-: 264.75
Method 1: Imean=13815.32, sigImean=186.76
Method 2: Imean=11738.95, sigIMean=244.31
Method 3: Imean=11738.95, sigIMean=7106.47

Thanks a lot!

Cheers,
Markus



----------
From: Markus Meier


On 11/02/12 02:52 PM, Bryan Lepore wrote:
> did you ever get a response on this? it is interesting but nobody
> posted publicly.
>
> -Bryan
>

Dear Bryan,

so far no one replied ... so please find my answer below. If someone
disagrees, please post.

None of the methods I have described are appropriate.

If the negative Bijvoet mates and the positive Bijvoet mates have been
merged separately to one intensity value for each (i.e. I+ or I-) plus
the associated standard deviation (sigI+ or sigI-), any weighted method
to calculate the mean will bias the intensity to either the I+ or the I-.

Therefore the only appropriate method is to use the unweighted mean:

Imean = 0.5*( I+ + I- )
sigImean = 0.5 * sqrt( sigI+^2 + sigI-^2 )

The only CCP4 program I found that actually does this is mtzMADmod. This
method also has the advantage that the original intensity values of I+
and I- can be reconstructed from the mean and the anomalous difference
(albeit with the loss of the original standard deviations).

Method 1 (scalepack2mtz)
should not be used. The resulting value is not the best estimate
(maximum likelihood)

Method 2 (in book by B. Rupp)
gives the maximum likelihood average in case that the reflections are
equivalent and is thus appropriate for the merging of the negative (or
positive) set of Bijvoet mates, centric reflections (where the anomalous
differences are zero) or in the case of an non-anomalous dataset the
merging of symmetry equivalent reflections.

Method 3
gives a more realistic sigma value in the case that the individual
intensity values are far apart and their individual standard deviations
are small. Consider the example I have posted:

I+: 23841.50 sigI+: 634.01 I-: 9628.57, sigI-: 264.75
Method 2: Imean=11738.95, sigIMean=244.31
Method 3: Imean=11738.95, sigIMean=7106.47

If the I+ and I- values above actually were symmetry equivalent
reflections in an non-anomalous dataset, the sigImean from method 2 is
ridiculously small and method 3 gives a far more realistic value. If
method 3 is the best mathematical solution to this problem I am not able
to judge and I have to trust the statistician (or programmer) who
implemented this solution.

Cheers,
Markus

----------
From: Tim Gruene 


Dear Markus,

why don't you reintegrate the data with hkl2000 telling the program to
treat them as non-anomalous data-set? This should give you scalepack
output with the Bijvoet pairs merged and overcome the problem you describe.

Cheers,
Tim
- --
-
----------
From: ccp4

This is a case where it is really helpful to keep some record of the unmerged integrated data.
And again rejecting the odd outlier does no harm to most analyses..

I like to use scala to check for outliers looking at all i+ I- measurements; if there is a wild discrepancy for weak anomalous signals you have probably found an outlier which is best rejected.

If you have a huge anomalous signal with good redundancy you probably shouldnt use Imean and the anomalous difference will be I= -I- with SD = sqrt (VarI+ VarI-).

Most software which uses that signal will check for outliers in the Anom difference lists too, and it is usually safest to exclude them from anom site searches, and phase calculations.
Eleanor

----------
From: arka chakraborty

Hi all,
Can someone put in links of a few articles relevant to the above discussion? ie where this kind of strategy was helpful in a specific practical situation?

Thanks in advance,

Regards,

ARKO
--



No comments:

Post a Comment