From: Filip Van Petegem
Date: 21 November 2011 22:04
Dear crystallographers,
----------
From: Steiner, Roberto <roberto.steiner@kcl.ac.uk>
----------
From: Jacob Keller
Just to clarify: I think the question is about the mathematical sense
of "significance," and not the functional or physiological
significance, right? If I understand the question correctly, wouldn't
the reasoning be that admittedly each atom in the model has a certain
positional error, but all together, it would be very unlikely for all
atoms to be skewed in the same direction?
Jacob
--
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
----------
From: Filip Van Petegem
Hello Jacob,
----------
From: James Stroud
----------
From: Dale Tronrud
This is a subtle problem and performing an analysis of this type
of error is confusing. Most of the tools we use to analyze errors
begin with the assumption that the "errors" are random and uncorrelated.
These include Luzzati and Fo-Fc maps.
My solution is to perform a null hypothesis test. If you run
two refinements starting from the same model, in one allowing the
RB shift and in the other forbidding it, which fits the data better?
If the difference in likelihood is quite small then you cannot
distinguish between a RB shifted model and one w/o the shift and
that shift must be insignificant (in a statistical sense.) If the
likelihood is better when the shift is allowed then the shift is
significant.
In my experience RB shifts of a couple tens of an Angstrom
are very significant even with 4 A resolution data. X-ray diffraction
is exquisitely sensitive to this sort of motion.
----------
From: James Stroud
----------
From: James Stroud
I meant to say "Euclidian distance". "Euclid's formula" has a specific meaning that is different.
----------
From: Jacob Keller
I am curious how all of this can be more than splitting hairs, i.e.,
under what conditions can this 1Ang domain motion mean something
biologically significant? Proteins are pretty flexible, after all,
especially between domains.
JPK
----------
From: James Stroud
----------
From: Bernhard Rupp (Hofkristallrat a.D.)
That of course is correct and leads to the interesting (and in part
previously discussed) question how to quantify (log) likelihood ratios in
terms of significance. I am not sure that this is trivial. More likely
better, very likely better, kinda really likely better? Having said that I
also want to caution the normal distribution & statistical test fans, who
think that a p value or similar has any more meaning. Whether p=0.05 means
something (other than a statistical metric) is equally fuzzy; it just
conveys a false sense of precision and erudition.
May I quote:
"The scientist must be the judge of his own hypotheses, not the
statistician."
A.F.W. Edwards (1992) in Likelihood - An account of the statistical concept
of likelihood and its application to scientific inference , p. 34.
Brrrrrrr
----------
From: Vellieux Frederic
A mixture between mathematical significance and biological significance as a part of the reply:
you should also take into account the thermal vibrations of the atoms present in that domain, i.e. the "thermal ellipsoids" when you have one of the representations of anisotropic temperature factors (when these can be obtained, high enough resolution), together with the associated density smearing. Especially if you observe correlated thermal ellipsoids. If you have a small "motion" but that this motion can be (at least in good part) "explained" by the inherent thermal "flexibility" of all atoms in that domain then perhaps you can question the significance of this domain motion (at least in the publication).
Fred.
----------
From: Nicholas M Glykos
Hi Filip,
Would it be a worth-while exercise to make a histogram of the absolute
values of atomic displacements ? If the distribution is bimodal (as you
indicated that it may), then indicating statistical significance should be
much easier (and convincing ?).
My twocents,
Nicholas
----------
From: Fabio Dall'Antonia
** Sorry for posting again, but I wanted to replace the subject by the specific topic (my former subject was due to the dact that I use the CCP4BB digest only) **
Dear Filip,
as Roberto mentioned earlier, our program Escet, respectively the RAPIDO web server - http://webapps.embl-hamburg.de/rapido/ - is taking coordinate errors (as derived from DPI- or empirically scaled B-factors) into account when judging the significance of structural invariance (that is, in particular domain movement). As far as crystal structures are concerned, you may want to give it a try ... otherwise I agree on the suggestion to compare the "internal" rmsd of individually superimposed domains to the overall rmsd of superimposed multi-domain structures, or more specifically, to the concerted shift of a domain relative to the other(s), so to estimate the resolution-independent significance of movement.
Cheers,
Fabio
--
Dr. rer. nat. Fabio Dall'Antonia
European Molecular Biology Laboratory c/o DESY
Notkestraße 85, Bldg. 25a
D-22603 Hamburg
----------
From: Savvas Savvides
Best regards,
----------
From: Pete Meyer
I suspect you may be better off asking an EM person (or on an EM list), due to the peculiarities of cryo-EM reconstruction. If I'm recalling correctly, EM resolution is determined by Fourier shell correlation so it might not have a one-to-one relationship to optical resolution. In addition, there usually is some uncertainty in converting voxel distances to physical distances - so an optical resolution in terms of Anstroms would need to be converted.
As a few others have pointed out, you'd also need to account for coordinate uncertainty. For 3d reconstruction, you've got uncertainties from particle image classification, image alignment, possibly conformational heterogeneity, and possibly uncertainties in the voxel position (due to interpolation) and voxel values (and the voxel to Angstrom conversion). My understanding of EM is limited to what I need to know to deal with EM data from collaborators, or how to use some EM software with low-resolution x-ray data - so I may be missing a few things (or pointing out problems that have been dealt with).
The short version, or at least my take on it, is that you may not be able to get a mathematically/statistically rigorous test for if the movement of a set of voxels is significant or not - but asking an EM person could probably give you a better answer.
Pete
Date: 21 November 2011 22:04
Dear crystallographers,
I have a general question concerning the comparison of different structures. Suppose you have a crystal structure containing a few domains. You also have another structure of the same, but in a different condition (with a bound ligand, a mutation, or simply a different crystallization condition,...). After careful superpositions, you notice that one of the domains has shifted over a particular distance compared to the other domains, say 1-1.5 Angstrom. This is a shift of the entire domain. Now how can you know that this is a 'significant' change? Say the overall resolution of the structures is lower than the observed distance (2.5A for example).
Now saying that a 1.5 Angstrom movement of an entire domain is not relevant at this resolution would seem wrong: we're not talking about some electron density protruding a bit more in one structure versus another, but all of the density has moved in a concerted fashion. So this would seem 'real', and not due to noise. I'm not talking about the fact that this movement was artificially caused by crystal packing or something similar. Just for whatever the reason (whether packing, pH, ligand binding, ...), you simply observe the movement.
So the question is: how you can state that a particular movement was 'significantly large' compared to the resolution limit? In particular, what is the theoretical framework that allows you to state that some movement is signifcant? This type of question of course also applies to other methods such as cryo-EM. Is a 7A movement of an entire domain 'significant' in a 10A map? If it is, how do we quantify the significance?
If anybody has a great reference or just an individual opinion, I'd like to hear about it.
Regards,
----------
From: Steiner, Roberto <roberto.steiner@kcl.ac.uk>
I believe ESCET was designed to answer your kind of question
Best
Roberto
----------
From: Jacob Keller
Just to clarify: I think the question is about the mathematical sense
of "significance," and not the functional or physiological
significance, right? If I understand the question correctly, wouldn't
the reasoning be that admittedly each atom in the model has a certain
positional error, but all together, it would be very unlikely for all
atoms to be skewed in the same direction?
Jacob
--
*******************************************
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
----------
From: Filip Van Petegem
Hello Jacob,
that's correct, I'm only looking at the mathematical significance, not the biological one. I follow the same reasoning - it is highly improbably for all atoms to be skewed in the same direction.
In a case I'm currently looking at, I'm particularly dealing with cryo-EM data, not X-ray structures, but with the same underlying principles: what are the odds that all pixels of the map move together in the same direction?
As mentioned for X-ray structures, a Luzzati analysis may give information about the positional errors, but there should be an increased resolution when comparing domain movements, because it's unlikely for all atoms to have an error in the same direction.
Filip
----------
From: James Stroud
I can think of a different but related question. How significant is a particular movement compared to a measured coordinate error? One way to measure the coordinate error in this example is to least-squares superpose the two instances of the domain in question and calculate the rmsd.
This makes the calculation of significance independent of the resolution of the data set.
James
----------
From: Dale Tronrud
This is a subtle problem and performing an analysis of this type
of error is confusing. Most of the tools we use to analyze errors
begin with the assumption that the "errors" are random and uncorrelated.
These include Luzzati and Fo-Fc maps.
My solution is to perform a null hypothesis test. If you run
two refinements starting from the same model, in one allowing the
RB shift and in the other forbidding it, which fits the data better?
If the difference in likelihood is quite small then you cannot
distinguish between a RB shifted model and one w/o the shift and
that shift must be insignificant (in a statistical sense.) If the
likelihood is better when the shift is allowed then the shift is
significant.
In my experience RB shifts of a couple tens of an Angstrom
are very significant even with 4 A resolution data. X-ray diffraction
is exquisitely sensitive to this sort of motion.
----------
From: James Stroud
Here's how I think about it:
If you use the empirical coordinate error that I described previously, you can use simple statistics to calculate how likely you are to get a coordinated movement (relative to a fixed landmark).
I can use a 1-d case as an example. In this 1-d case, let's pretend that we have a domain of N=25 atoms where atom 2 is about 1 away from atom 1 and atom 3 is 2 away from atom 1 and one away from atom 2, etc, with a standard deviation of 1 for the position of the atoms. If atom 1 for domain A is at 1, this is just
A_j = j
Then you can have domain B that has moved +1 compared to domain A:
B_j = j+1
Since we have an alignment (B_j -> A_j), then we can calculate the movement, X:
X = mean(B) - mean(A)
We can also calculate the error of the ensemble (aka the error of the mean):
sigmaE = std( (B - mean(B)) - (A - mean(B)) ) / sqrt(25)
Then, we can calculate how likely it is we observe the movement X by tail integration of the cumulative normal distribution. We will justify this for the 3-d case because the least squares superposition (from which we estimate the coordinate error) assumes normality.
Here is a simulation of this scenario in python:
py> import numpypy> from scipy.special import ndtrpy> a = numpy.array([numpy.random.normal(j) for j in xrange(25)])py> b = numpy.array([numpy.random.normal(j+1) for j in xrange(25)])py> aarray([ 1.38125295, -0.27126096, 1.7597104 , 1.36242299,3.88327659, 4.33063307, 5.00544708, 7.02888858,7.83945228, 9.72101719, 10.36231633, 10.29176378,11.78497375, 12.16082056, 14.31057296, 13.25941344,17.93779336, 18.05626047, 18.62148347, 20.52756478,19.73362283, 21.83953268, 22.28038617, 23.24545481, 22.96192518])py> barray([ 3.32750181, 2.42664791, 3.23309368, 4.32882699,6.59985764, 6.49597664, 5.27921723, 7.8573831 ,9.98722475, 10.65225383, 11.69970159, 11.67435798,12.16191254, 13.69297801, 14.21845382, 17.21423427,16.89347161, 17.68778305, 17.89371115, 18.7679351 ,py> X = b.mean() - a.mean()py> sigma_ensemble = ((b - b.mean()) - (a - a.mean())).std() / math.sqrt(25)py> X_standardized = (X - 0) / sigma_ensemblepy> 2 * ndtr(-abs(X_standardized))0.00011596192653578624
This means, for the 1-d scenario I describe, (using the random arrays generated above), the movement is expected about once for every 10,000 "experiments", providing a p-value, or estimate of significance. Note that the 2 comes from the fact that the cumulative distribution has 2 tails.
A 3-D calculation using the rmsd as the coordinate error would be similar except that you use Euclid's formula to calculate the distances in higher dimensions (instead of the absolute value of a simple subtraction as in 1-d).
James
----------
From: James Stroud
I meant to say "Euclidian distance". "Euclid's formula" has a specific meaning that is different.
----------
From: Jacob Keller
I am curious how all of this can be more than splitting hairs, i.e.,
under what conditions can this 1Ang domain motion mean something
biologically significant? Proteins are pretty flexible, after all,
especially between domains.
JPK
----------
From: James Stroud
To engage in the discussion, I think we had to accept this:So the point of the discussion, as I understand it, is to figure out whether the movement warrants further consideration in the first place, i.e. whether it is significant with respect to the error of the models.
I think it doesn't take too much energy to discount the attempt to quantify the statistical significance by claiming that one can't imagine how such a change might be biologically significant. I'm really not privy to the structures in question, so I am in no position to make this judgement.
James
----------
From: Bernhard Rupp (Hofkristallrat a.D.)
That of course is correct and leads to the interesting (and in part
previously discussed) question how to quantify (log) likelihood ratios in
terms of significance. I am not sure that this is trivial. More likely
better, very likely better, kinda really likely better? Having said that I
also want to caution the normal distribution & statistical test fans, who
think that a p value or similar has any more meaning. Whether p=0.05 means
something (other than a statistical metric) is equally fuzzy; it just
conveys a false sense of precision and erudition.
May I quote:
"The scientist must be the judge of his own hypotheses, not the
statistician."
A.F.W. Edwards (1992) in Likelihood - An account of the statistical concept
of likelihood and its application to scientific inference , p. 34.
Brrrrrrr
----------
From: Vellieux Frederic
A mixture between mathematical significance and biological significance as a part of the reply:
you should also take into account the thermal vibrations of the atoms present in that domain, i.e. the "thermal ellipsoids" when you have one of the representations of anisotropic temperature factors (when these can be obtained, high enough resolution), together with the associated density smearing. Especially if you observe correlated thermal ellipsoids. If you have a small "motion" but that this motion can be (at least in good part) "explained" by the inherent thermal "flexibility" of all atoms in that domain then perhaps you can question the significance of this domain motion (at least in the publication).
Fred.
----------
From: Nicholas M Glykos
Hi Filip,
Would it be a worth-while exercise to make a histogram of the absolute
values of atomic displacements ? If the distribution is bimodal (as you
indicated that it may), then indicating statistical significance should be
much easier (and convincing ?).
My twocents,
Nicholas
----------
From: Fabio Dall'Antonia
** Sorry for posting again, but I wanted to replace the subject by the specific topic (my former subject was due to the dact that I use the CCP4BB digest only) **
Dear Filip,
as Roberto mentioned earlier, our program Escet, respectively the RAPIDO web server - http://webapps.embl-hamburg.de/rapido/ - is taking coordinate errors (as derived from DPI- or empirically scaled B-factors) into account when judging the significance of structural invariance (that is, in particular domain movement). As far as crystal structures are concerned, you may want to give it a try ... otherwise I agree on the suggestion to compare the "internal" rmsd of individually superimposed domains to the overall rmsd of superimposed multi-domain structures, or more specifically, to the concerted shift of a domain relative to the other(s), so to estimate the resolution-independent significance of movement.
Cheers,
Fabio
--
Dr. rer. nat. Fabio Dall'Antonia
European Molecular Biology Laboratory c/o DESY
Notkestraße 85, Bldg. 25a
D-22603 Hamburg
I can think of a different but related question. How significant is a particular movement compared to a measured coordinate error? One way to measure the coordinate error in this example is to least-squares superpose the two instances of the domain in question and calculate the rmsd.
----------
From: Savvas Savvides
Dear Filip
'Annoying' MR problems for which the answer often lies in relatively small differences between the search model and 'RB-shifted' domains and/or subdomains in the actual structure, are I think a good experimental indication of the significance of such issues.
To extrapolate from this, RB refinement of whole domains that are initially misplaced by 0.5-2 angstroms will show that X-ray data to 4-8 angstrom resolution are often sufficient to refine the model to a position that decisively agrees better with data as judged by crystallographic refinement R-factors and electron density. So, my reaction every time I see such behavior is that the observed domain shift must be significant and that I should do my crystallographic best to model and cross-validate it as well as the data quality and resolution will allow.
However, the biological interpretation and impact of such significant displacements are of course quite specific to the system under study.
Best regards,
Savvas
----------
From: Pete Meyer
I suspect you may be better off asking an EM person (or on an EM list), due to the peculiarities of cryo-EM reconstruction. If I'm recalling correctly, EM resolution is determined by Fourier shell correlation so it might not have a one-to-one relationship to optical resolution. In addition, there usually is some uncertainty in converting voxel distances to physical distances - so an optical resolution in terms of Anstroms would need to be converted.
As a few others have pointed out, you'd also need to account for coordinate uncertainty. For 3d reconstruction, you've got uncertainties from particle image classification, image alignment, possibly conformational heterogeneity, and possibly uncertainties in the voxel position (due to interpolation) and voxel values (and the voxel to Angstrom conversion). My understanding of EM is limited to what I need to know to deal with EM data from collaborators, or how to use some EM software with low-resolution x-ray data - so I may be missing a few things (or pointing out problems that have been dealt with).
The short version, or at least my take on it, is that you may not be able to get a mathematically/statistically rigorous test for if the movement of a set of voxels is significant or not - but asking an EM person could probably give you a better answer.
Pete
No comments:
Post a Comment