Thursday, 5 April 2012

Via Annual Reports...Re: [ccp4bb] very informative - Trends in Data Fabrication


From: Jrh
Date: 5 April 2012 15:47


Dear Roger,
At the recent ICSTI Workshop on Delivering Data in science the NSF presenter, when I asked about monitoring, replied that the PIs' annual reports should include data management aspects.
Best wishes,
John

Prof John R Helliwell DSc FInstP CPhys FRSC CChem F Soc Biol.
Chair School of Chemistry, University of Manchester, Athena Swan Team.

On 5 Apr 2012, at 14:08, Roger Rowlett wrote:

FYI, every NSF grant proposal now must have a data management plan that describes how all experimental data will be archived and in what formats. I'm not sure how seriously these plans are monitored, but a plan must be provided nevertheless. Is anyone NOT archiving their original data in some way?
Roger Rowlett
On Apr 5, 2012 7:16 AM, "John R Helliwell"  wrote:
Dear 'Aaleshin

Re the pixel detector; yes this is an acknowledged raw data archiving
challenge; possible technical solutions include:- summing to make
coarser images ie in angular range, lossless compression (nicely
described on this CCP4bb by James Holton) or preserving a sufficient
sample of data....(but nb this debate is certainly not yet concluded).

Re "And all this hassle is for the only real purpose of preventing data fraud?"

Well.....Why publish data?
Please let me offer some reasons:
• To enhance the reproducibility of a scientific experiment
• To verify or support the validity of deductions from an experiment
• To safeguard against error
• To allow other scholars to conduct further research based on
experiments already conducted
• To allow reanalysis at a later date, especially to extract 'new'
science as new techniques are developed
• To provide example materials for teaching and learning
• To provide long-term preservation of experimental results and future
access to them
• To permit systematic collection for comparative studies
• And, yes, To better safeguard against fraud than is apparently the
case at present

Also to (probably) comply with your funding agency's grant conditions:-
Increasingly, funding agencies are requesting or requiring data
management policies (including provision for retention and access) to
be taken into account when awarding grants. See e.g. the Research
Councils UK Common Principles on Data Policy
(http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital
Curation Centre overview of funding policies in the UK
(http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies).
See also http://forums.iucr.org/viewtopic.php?f=21&t=58 for discussion
on policies relevant to crystallography in other countries. Nb these
policies extend over derived, processed and raw data, ie without
really an adequate clarity of policy from one to the other stages of
the 'data pyramid' ((see
http://www.stm-assoc.org/integration-of-data-and-publications).


And just to mention IUCr Journals Notes for Authors for biological
macromolecular structures, where we have our ie macromolecular
crystallography's version of the 'data pyramid' :-

(1) Derived data
• Atomic coordinates, anisotropic or isotropic displacement
parameters, space group information, secondary structure and
information about biological functionality must be deposited with the
Protein Data Bank before or in concert with article publication; the
article will link to the PDB deposition using the PDB reference code.
• Relevant experimental parameters, unit-cell dimensions are required
as an integral part of article submission and are published within the
article.

(2) Processed experimental data
• Structure factors must be deposited with the Protein Data Bank
before or in concert with article publication; the article will link
to the PDB deposition using the PDB reference code.

(3) Primary experimental data (here I give small and macromolecule
Notes for Authors details):-
For small-unit-cell crystal/molecular structures and macromolecular
structures IUCr journals have no current binding policy regarding
publication of diffraction images or similar raw data entities.
However, the journals welcome efforts made to preserve and provide
primary experimental data sets. Authors are encouraged to make
arrangements for the diffraction data images for their structure to be
archived and available on request.
For articles that present the results of powder diffraction profile
fitting or refinement (Rietveld) methods, the primary diffraction
data, i.e. the numerical intensity of each measured point on the
profile as a function of scattering angle, should be deposited.
Fibre data should contain appropriate information such as a photograph
of the data. As primary diffraction data cannot be satisfactorily
extracted from such figures, the basic digital diffraction data should
be deposited.


Finally to mention that many IUCr Commissions are interested in the
possibility of establishing community practices for the orderly
retention and referencing of raw data sets, and the IUCr would like to
see such data sets become part of the routine record of scientific
research in the future, to the extent that this proves feasible and
cost-effective.
I draw your attention therefore to the IUCr Forum on such matters at:-
http://forums.iucr.org/
Within this Forum you can find for example the ICSU convened Strategic
Coordinating Committee on Information and Data fairly recent report;
within this we learn of many other areas of science efforts on data
archiving and eg that the radio astronomy square kilometre array will
pose the biggest raw data archiving challenge on the planet.[Our needs
are thereby relatively modest.]

The IUCr Diffraction Data Deposition Working Group is actively
addressing all these various issues.
We weclome your input at the IUCr Forum, which will thereby be most
timely. Thankyou.

Best wishes,
Yours sincerely,
John
Professor John R Helliwell DSc


On Thu, Apr 5, 2012 at 1:24 AM, aaleshin wrote:
> People who raise their voices for a prolonged storage of raw images miss a
> simple fact that the volume of collected data increases proportionally if
> not faster than the cost of storage space drops. I just had an opportunity
> to collect data with the PILATUS detector at SSRL and say you that monster
> allows slicing the data 4-5 times thinner than other detectors do. Some
> people also like collecting very redundant data sets. Even now, transferring
> and storage of raw data from a synchrotron is a pain in the neck, but in a
> few years it may become simply impractical. And all this hassle is for the
> only real purpose of preventing data fraud? An't there a cheaper and more
> adequate solutions to the problem?
>
> I also wonder why after the first occurrence of data fraud several years
> ago, PDB did not take any action to prevent its appearance in the future? Or
> administrative actions are simply impossible nowadays without a mega-dollar
> grant?
>
>


--

No comments:

Post a Comment