Thursday, 5 April 2012

Category 4 Re: [ccp4bb] very informative - Trends in Data Fabrication


From: Jrh
Date: 5 April 2012 15:43


Dear Herbert,
Category 4, in Manchester, we find is tricky, for want of a better word. Needless to say that we have collaborators on our Crystallography Research Service who request data sets from eg ten years ago, that are now urgent for publication writing up. So we are keeping everything, although only recent years the raw diffraction images, and nb soon to be assisted by the Univ Manchester centralised Data Repository for its researchers. (Incidentally I have kept all of my film oscillation, and inc later Laue data, back to approx 1977, which fills a whole wall shelf worth, ~ 10 metres.)
Greetings,
John

Prof John R Helliwell DSc FInstP CPhys FRSC CChem F Soc Biol.
Chair School of Chemistry, University of Manchester, Athena Swan Team.
http://www.chemistry.manchester.ac.uk/aboutus/athena/index.html



On 5 Apr 2012, at 13:50, "Herbert J. Bernstein"  wrote:

> Dear Colleagues,
>
>  Clearly, no system will be able to perfectly preserve every pixel of
> every dataset collected at a cost that can be afforded.  Resources are
> finite and we must set priorities.  I would suggest that, in order
> of declining priority, we try our best to retain:
>
>  1.  raw data that might tend to refute published results
>  2.  raw data that might tend to support published results
>  3.  raw data that may be of significant use in currently
> ongoing studies either in refutation or support
>  4.  raw data that may be of significant use in future
> studies
>
> While no archiving system can be perfect, we should not let the
> search for a perfect solution prevent us from working with
> currently available good solutions, and even in this era of tight
> budgets, there are good solutions.
>
>  Regards,
>    Herbert
>
> On 4/5/12 7:16 AM, John R Helliwell wrote:
>> Dear 'Aaleshin@burnham.org',
>>
>> Re the pixel detector; yes this is an acknowledged raw data archiving
>> challenge; possible technical solutions include:- summing to make
>> coarser images ie in angular range, lossless compression (nicely
>> described on this CCP4bb by James Holton) or preserving a sufficient
>> sample of data....(but nb this debate is certainly not yet concluded).
>>
>> Re "And all this hassle is for the only real purpose of preventing data fraud?"
>>
>> Well.....Why publish data?
>> Please let me offer some reasons:
>> • To enhance the reproducibility of a scientific experiment
>> • To verify or support the validity of deductions from an experiment
>> • To safeguard against error
>> • To allow other scholars to conduct further research based on
>> experiments already conducted
>> • To allow reanalysis at a later date, especially to extract 'new'
>> science as new techniques are developed
>> • To provide example materials for teaching and learning
>> • To provide long-term preservation of experimental results and future
>> access to them
>> • To permit systematic collection for comparative studies
>> • And, yes, To better safeguard against fraud than is apparently the
>> case at present
>>
>> Also to (probably) comply with your funding agency's grant conditions:-
>> Increasingly, funding agencies are requesting or requiring data
>> management policies (including provision for retention and access) to
>> be taken into account when awarding grants. See e.g. the Research
>> Councils UK Common Principles on Data Policy
>> (http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital
>> Curation Centre overview of funding policies in the UK
>> (http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies).
>> See also http://forums.iucr.org/viewtopic.php?f=21&t=58 for discussion
>> on policies relevant to crystallography in other countries. Nb these
>> policies extend over derived, processed and raw data, ie without
>> really an adequate clarity of policy from one to the other stages of
>> the 'data pyramid' ((see
>> http://www.stm-assoc.org/integration-of-data-and-publications).
>>
>>
>> And just to mention IUCr Journals Notes for Authors for biological
>> macromolecular structures, where we have our ie macromolecular
>> crystallography's version of the 'data pyramid' :-
>>
>> (1) Derived data
>> • Atomic coordinates, anisotropic or isotropic displacement
>> parameters, space group information, secondary structure and
>> information about biological functionality must be deposited with the
>> Protein Data Bank before or in concert with article publication; the
>> article will link to the PDB deposition using the PDB reference code.
>> • Relevant experimental parameters, unit-cell dimensions are required
>> as an integral part of article submission and are published within the
>> article.
>>
>> (2) Processed experimental data
>> • Structure factors must be deposited with the Protein Data Bank
>> before or in concert with article publication; the article will link
>> to the PDB deposition using the PDB reference code.
>>
>> (3) Primary experimental data (here I give small and macromolecule
>> Notes for Authors details):-
>> For small-unit-cell crystal/molecular structures and macromolecular
>> structures IUCr journals have no current binding policy regarding
>> publication of diffraction images or similar raw data entities.
>> However, the journals welcome efforts made to preserve and provide
>> primary experimental data sets. Authors are encouraged to make
>> arrangements for the diffraction data images for their structure to be
>> archived and available on request.
>> For articles that present the results of powder diffraction profile
>> fitting or refinement (Rietveld) methods, the primary diffraction
>> data, i.e. the numerical intensity of each measured point on the
>> profile as a function of scattering angle, should be deposited.
>> Fibre data should contain appropriate information such as a photograph
>> of the data. As primary diffraction data cannot be satisfactorily
>> extracted from such figures, the basic digital diffraction data should
>> be deposited.
>>
>>
>> Finally to mention that many IUCr Commissions are interested in the
>> possibility of establishing community practices for the orderly
>> retention and referencing of raw data sets, and the IUCr would like to
>> see such data sets become part of the routine record of scientific
>> research in the future, to the extent that this proves feasible and
>> cost-effective.
>> I draw your attention therefore to the IUCr Forum on such matters at:-
>> http://forums.iucr.org/
>> Within this Forum you can find for example the ICSU convened Strategic
>> Coordinating Committee on Information and Data fairly recent report;
>> within this we learn of many other areas of science efforts on data
>> archiving and eg that the radio astronomy square kilometre array will
>> pose the biggest raw data archiving challenge on the planet.[Our needs
>> are thereby relatively modest.]
>>
>> The IUCr Diffraction Data Deposition Working Group is actively
>> addressing all these various issues.
>> We weclome your input at the IUCr Forum, which will thereby be most
>> timely. Thankyou.
>>
>> Best wishes,
>> Yours sincerely,
>> John
>> Professor John R Helliwell DSc
>>
>>
>> On Thu, Apr 5, 2012 at 1:24 AM, aaleshin  wrote:
>>
>>> People who raise their voices for a prolonged storage of raw images miss a
>>> simple fact that the volume of collected data increases proportionally if
>>> not faster than the cost of storage space drops. I just had an opportunity
>>> to collect data with the PILATUS detector at SSRL and say you that monster
>>> allows slicing the data 4-5 times thinner than other detectors do. Some
>>> people also like collecting very redundant data sets. Even now, transferring
>>> and storage of raw data from a synchrotron is a pain in the neck, but in a
>>> few years it may become simply impractical. And all this hassle is for the
>>> only real purpose of preventing data fraud? An't there a cheaper and more
>>> adequate solutions to the problem?
>>>
>>> I also wonder why after the first occurrence of data fraud several years
>>> ago, PDB did not take any action to prevent its appearance in the future? Or
>>> administrative actions are simply impossible nowadays without a mega-dollar
>>> grant?
>>>
>>>
>>>
>>
>> --
>>
>>
>

No comments:

Post a Comment