CCP4 Bulletin Board Archive: Merging data collected at two different wavelength

From: arka chakraborty

Hi all,

I have two datasets, both CO SAD data, one collected at CO anomalous wavelength at synchroton and the other at home source. I wish to combine these two data-sets and use for SAD phasing. Can anyone suggest how this can be done?

Regards,

ARKO

--

ARKA CHAKRABORTY

----------
From: Tim Gruene

Dear Arko,

could you not try MAD with the two different data sets?
Otherwise you can check the strength of the anomalous signal for both
sets separately (I am sure pointless prints the anomalous CC over
resolution shell) and after merging them.
If the signal increases, use the merged data set for SAD, it it does
not, use them separately (but then you call it MAD...).

Tim
- --
- --
Dr Tim Gruene

----------
From: James Holton

How to merge two or more runs depends on the software you used to process the images. If you used MOSFLM/CCP4, then you would use the programs REBATCH (perhaps REINDEX) and SORTMTZ to combine the unmerged mtz files (the ones that come out of MOSFLM) before feeding them to SCALA. My program Scaler Elves will do this automatically if you just put the unmerged mtz file names on the command line:
http://bl831.als.lbl.gov/~jamesh/elves/manual/scaler.html
Once they are scaled, by default Elves will merge different wavelengths separately, but you can run the merge.com script they produce with "all" on the command line to just average everything together into one merged mtz file.

You can also feed the unmerged mtz files to POINTLESS and it can be told to do pretty much the same thing. POINTLESS will also take XDS_ASCII.HKL files as input, but I don't think it should surprise anyone that XSCALE was designed to combine data from as many XDS runs as you like, as well as do zero-dose extrapolation.

If you are using the HKL Research Inc. suite, then you can put the *.sca files back into scalepack as individual "frames" (which is what you want to do to check for non-isomorphism), or you can input all the *.x files into one scalepack run and obtain a single merged *.sca file that way.

There are plenty of other processing packages out there as well, but I will make no attempt to make this post a comprehensive list. Suffice it to say, the exact procedure for combining two runs depend on the software you are using (and the manuals are remarkably helpful).

An important thing that is not done automatically, however, is to check if your space group has more than one indexing solution. Basically, if merohedral twinning is possible, then it is also possible that your two datasets were indexed differently. If so, they will not merge well! Until you re-index one of them. A nice table to use is here:
http://www.ccp4.ac.uk/html/reindexing.html

Again, POINTLESS can be used to try and re-index one dataset to match a "reference", but if non-isomorphism is high, then it can be hard to tell. In general, it is better to merge/average data together when the sets are isomorphous and it is not a good idea to merge/average if they are not! How much non-isomorphism is too much depends on how small of a difference signal you are trying to measure. For example if you are trying to measure a 3% dispersive difference, then 15% non-isomorphism is way too much.

It is an under-appreciated fact that radiation damage is a serious source of non-isomorphism. Banumathi et al. (2006) found about 1% increase in non-isomorphism per MGy of dose. If you don't know what a MGy is, then I recommend you have a look at the open-access article:
http://dx.doi.org/10.1107/S0909049509004361
There are plenty of other sources of error as well, and although there is no a-priori reason to think that data taken from one instrument on one day would be any different from data taken from the same crystal on a completely different instrument on a different day, it is never surprising when they don't merge very well.

By the way, I wouldn't use "MAD" to describe the mergeing of non-isomorphous datasets. Strictly speaking, MAD is at least an attempt to measure both anomalous (f") and dispersive (f') differences, and I don't think it is appropriate to use the term "MAD" when you know the dispersive signal is washed out by non-isomorphism. I call such attempts MSAD (mult-SAD), which I think helps differentiate them from actual MAD data collections where you at least try not to fry the crystal between measurements that you need to subtract to get your phasing signal. Unless, of course, you are doing RIP!

Just my humble opinion,

-James Holton
MAD Scientist

----------
From: Tim Gruene

Comments on the comments ;-):

On 01/18/2012 05:54 PM, James Holton wrote:
> [...]
both pointless (as you point out) and XDS do this automatically (whether
or not they do it correctly is a different matter).

> [...]
I agree, neither would I.
Just to be on the save side and avoid confusion by less experienced
readers of the list: I used the term MAD because there are two data sets
collected at two different wavelenghts, both of which should give rise
to a measurable anomalous signal from the Co in the sample.

Cheers, Tim

----------
From: Jacob Keller

Isn't it true that we cannot even agree on what MAD stands for?

Is the following right?

M = Multiple-wavelength. I think everyone agrees to this, although I
believe I've seen the occasional (and sometime non-sensical) variant
A = Anomalous (I think everyone agrees, although this term should
really be changed to "resonant," as there is no anomaly to it
anymore...)
D = Diffraction, Dispersion, Destruction, Dissolution...?

JPK

----------
From: D Bonsor

D =Discussion?

----------
From: Francis E Reyes

Ditto
Using the terms 'MAD' and 'SAD' have always been confusing to me when considering more complex phasing cases. What happens if you have intrinsic Zn's, collect a 3wvl experiment and then derivatize it with SeMet or a heavy atom? Or the MAD+native scenario (SHARP) ?

Instead of using MAD/SAD nomenclature I favor explicitly stating whether dispersive/anomalous/isomorphous differences (and what heavy atoms for each ) were used in phasing. Aren't analyzing the differences (independent of source) the important bit anyway?

F

---------------------------------------------
Francis E. Reyes M.Sc.

----------
From: Jacob Keller

That is excellent! You refer obviously to the multiple anomalous
discussions on the bb? (Maybe d = disagreement?)

JPK

--
*******************************************
Jacob Pearson Keller

----------
From: Pete Meyer

Hi,

Regardless of what the consensus on naming for the technique, I'd suggest you combine these datasets during phasing (I'm aware of MLPHARE, SHARP, PHASIT supporting multiple anomalous datasets during phasing; others probably do as well). Combining at the merging step (pointless/scalepack/etc) might result in averaging amplitudes collected at different wavelengths (and therefore with different anomalous signal): and since your phases will come from amplitude differences this may result in less reliable phases.

Pete

----------
From: Phil Jeffrey

Can I be dogmatic about this ?

Multiwavelength anomalous diffraction from Hendrickson (1991) Science Vol. 254 no. 5028 pp. 51-58

Multiwavelength anomalous diffraction (MAD) from the CCP4 proceedings http://www.ccp4.ac.uk/courses/proceedings/1997/j_smith/main.html

Multi-wavelength anomalous-diffraction (MAD) from Terwilliger Acta Cryst. (1994). D50, 11-16

etc.

I don't see where the problem lies:

a SAD experiment is a single wavelength experiment where you are using the anomalous/dispersive signals for phasing

a MAD experiment is a multiple wavelength version of SAD. Hopefully one picks an appropriate range of wavelengths for whatever complex case one has.

One can have SAD and MAD datasets that exploit anomalous/dispersive signals from multiple difference sources. This after all is one of the things that SHARP is particularly good at accommodating.

If you're not using the anomalous/dispersive signals for phasing, you're collecting native data. After all C,N,O,S etc all have a small anomalous signal at all wavelengths, and metalloproteins usually have even larger signals so the mere presence of a theoretical d" difference does not make it a SAD dataset. ALL datasets contain some anomalous/dispersive signals, most of the time way down in the noise.

Phil Jeffrey
Princeton

----------
From: Jacob Keller

I wish you could, but I don't think so, because even though those
sources call it that, others don't. I agree with your thinking, but
usage is usage.
I think "dispersive" usually refers to differences caused by changes
in f'/f" between wavelengths, no?

JPK

----------
From: Phoebe Rice

And 10,000 lemmings can't be wrong?

----------
From: Jacob Keller

This begs the question* whether you want the lemmings to understand
you. One theory of language, gotten more or less from Strunk and
White's Elements of Style, is that the most important feature of
language is its transparency to the underlying thoughts. Bad language
breaks the transparency, reminds you that you are reading and not
simply thinking the thoughts of the author, who should also usually be
invisible. Bad writing calls attention to itself and to the author,
whereas good writing guides the thoughts of the reader unnoticeably.
For Strunk and White, it seems that all rules of writing follow this
principle, and it seems to be the right way to think about language.
So, conventions, even when somewhat inaccurate, are important in that
they are often more transparent, and the reader does not get stuck on
them.

Anyway, a case in point of lemmings is that once Wayne Hendrickson
himself suggested that the term anomalous be decommissioned in favor
of "resonant." I don't hear any non-lemmings jumping on that
bandwagon...

JPK

*Is this the right use of "beg the question?"

----------
From: Soisson, Stephen M

But if we were to follow that convention we would have been stuck with Multi-wavelength Resonant Diffraction Experimental Results, or, quite simply, MuRDER.

----------
From: Ethan Merritt

You could switch that to Multiple Energy Resonant Diffraction Experiment
but I don't think that would help any.

As to "anomalous" - the term comes from the behaviour of the derivative
delta_(optical index) / delta_(wavelength)
This term is positive nearly everywhere, but is anomalously negative
at the absorption edge.

Ethan

----------
From: arka chakraborty

Hi all,

Thanks for providing multiple solutions to my problem. Prof . Tim Gruene and Prof. James Holton provided some nice solutions. However since the data are collected from different crystals, I am not sure whether I can do MAD phasing. My aim is to merge the two data-sets to circumvent the problem posed by the fact that the synchroton data is twinned. So maybe merging the data sets will provide better phases from SAD phasing? My main concern was how to do scaling adjustments before using the data-sets together.

Thanking you,

Regards,

ARKO

----------
From: James Holton

Oh dear.

You definitely cannot de-twin a dataset by mergeing it with a non-twinned dataset! And if the twin fraction of your synchrotron set is much greater than 0.3 then it is unlikely that you will be able to use the anomalous differences to solve the phase problem.

If I were you, I would focus on the non-twinned crystal system. You CAN average anomalous differences across different crystals, provided they are reasonably isomorphous. http://dx.doi.org/10.1107/S0907444910046573

And I should add the caveat that twinning is equivalent to "non-isomorphism" until after you have solved the structure because it dramatically changes the intensity you have available for any given hkl index.

-James Holton
MAD Scientist

----------
From: arka chakraborty

Hi all,

Thanks a lot for the valuable suggestions.I have tried detwinning it but the detwinning program in CCP4 takes care of only merohedral data( if I am not wrong) and the other program( I guess Cell-now in Apex 2 by Bruker?) which takes care of non-merohedral twinning is not accessible to it( as I can't buy it). Also, the anomalous signal in the home source data is pretty weak. So, I was thinking about trying to get a better result by trying to merge the two data sets, though I am aware of the problem posed by twinning. But since we were not being able to get crystals of size mountable at home source, I thought why not try whatever is possible!

Thanking you,

Regards,

ARKO

----------
From: Francis E Reyes

Let's have some more info here.

What resolution we talking about? What are the space groups?

What is the nature of the Co (is it a heavy atom soak?, a bound Co?)

Have you tried to find heavy atom sites? (anomalous Patterson, or some other automated method)

Are they believable?

You say the dataset is twinned. How do you know this is the case?

You're looking for help (and you've come to the right place) but knowing a little bit more about your system will help others suggest a suitable phasing scenario.

F

----------
From: arka chakraborty

Hi all,

There is a good news and it is that we have got crystals this morning which have diffracted upto 2.5 ang at home source and data collection is going on. I hope this solves the problem though the not so good anomalous signal at home source will probably make SAD phasing difficult. However to provide some additional information asked for by Prof. Francis E Reyes:

The data is in all probability twinned because of two reasons:
1) At the time of data collection the two fused crytsals( or crystallites?) could be seen clearly which unfortunately could not be separated.
2) The data shows clear interference of lattices with one lattice showing much higher intensity for the first half or so of the 360 images collected and the other lattice for the second half. I have tried separating the frames and solving but it didnt work out.
The spacegroup is P41212 in all data. The synchroton data has resolution of 2.75 ang collected at 1.60428 ang wavelength and the home data 3.0 ang. collected at CuKalpha wavelength. All are Co soaks. I should restate that it is a DNA oligomer and though I have been able to get heavy atom positions and the occupancies look ok but the phased map is too noisy to be interpreted.

Regards,

ARKO

You say the data set is twinned. How do you know this is the case?

CCP4 Bulletin Board Archive

Friday, 10 February 2012

Merging data collected at two different wavelength

No comments:

Post a Comment

Followers