Building Support for ‘Principle Guidelines’ for Data at Risk

This post is by Elizabeth Griffin, chair of the CODATA Data at Risk Task Group.

The immediate objective of the Data At Risk Task Group (DAR-TG) is to raise awareness of the existence of large amounts of analogue (pre-digital) records – observations and measurements – that contain important and unique scientific information but are inaccessible electronically. The overall objective is to facilitate the conversion of the scientific content of those historic data to electronic formats for inclusion in modern research, where their special contributions can be utilised to the full; the matter is especially critical when long-term changes need to be measured accurately. An important step towards those goals is to raise both public and specific scientific awareness of the seriousness of neglecting historic data, and to illustrate the benefits through examples of successful data recovery.

DAR-TG at the UNESCO Memory of the World Conference

Elizabeth Griffin (centre) and other panellist of the Data at Risk session at the UNESCO Memory of the World Conference, Vancouver 2013

Panellists from the Data at Risk session at the UNESCO Memory of the World Conference, Vancouver, September 2013 (left to right: Fraser Taylor (Cartography, Carlton University Ottawa); Rick Crouthamel (IEDRO, International Environmental Data Rescue Organization); Elizabeth Griffin; Stephen Del Grecco (NOAA and Climate Database Modernization Programme); Chris Muller (Muller Media, NY, a private-sector company that restores data from old tapes of almost any kind).)

A Special Session held by DAR-TG held at the UNESCO Memory of the World Conference (Vancouver, September 2012) accentuated the important role of collaboration in endeavours to rescue the information from imperilled data which at present only exit in analogue forms. Such collaborations transcend the specificities which separate individual data types. Moreover, they advance the goals of the UNESCO programme by addressing and complementing an area that is not as yet explicitly supported.

DAR-TG Panel Discussion at Digital Heritage 2013

DAR-TG was awarded competitively a 2-hour Panel Discussion at Digital Heritage 2013 (Mar- seille, October 2013) entitled ‘A Joint Heritage: where science and culture meet’. The invited Panellists represented a broad span of specialities: bio- diversity at the Berlin Botanical Museum [Agnes Kirchhoff], metadata and library science at the University of North Carolina [Davenport Robertson], watershed and estuary stewardship with IEDRO and Citizen Science [Carmen Skarlupka], digital humanities at London University [Marilyn Deegan], digital philology and classics at the University of Oxford [James Brusuelas], astrophysics research at Canada’s Dominion Astrophysical Observatory [Elizabeth Griffin] and at the Royal Observatory of Belgium [Thierry Pauwels], and climate research (also engaging citizen science) at the UK Met Office [Rob Allan and Philip Brohan].

Digital Heritage LogoDespite that broad span, and the very different kinds of materials, tools and skills that were required by any data-recovery task in the different research domains, what emerged during an energetic discussion was an overarching commonality of challenges, problems – and also, to some extent, of solutions – which Panellists described. The scientists emphasised that the research need is a prime driver for historic data recovery, so broadcasting the results of a data-rescue project are critical in accounting for each spend of resources. In the humanities, communication can be fraught with difficulties, whether of language, data de-coding, identifying contacts or ascertaining locations. The problems encountered may be project-specific, but solutions can be made economical by sharing methodologies or software, and citizen science is a resource that should be more widely tapped everywhere. The all-important battle for funding is definitely best addressed by a consortium endowed with a common voice, rather than by individuals or isolated groups. Advocating Best Practice and sharing progress through international workshops were named as reliable starting points for progress, so Panellists have remained in touch and are preparing a report based on the discussions that took place, as their first steps towards realizing that ideal of international collaboration.

Preserving and Adding Value to Data in Science

pv2013_web_graphThe theme of historic data rescue efforts in Earth sciences was subsequently given space at another workshop, Preserving and Adding Value to Data in Science (Frascati, November 2013), reaching an audience primarily involved in born-digital data. Frequently-issued reminders of the broader and possible trans-disciplinary applications of those data were well absorbed by the audience, and became enshrined in the meeting’s formal Conclusions; they also resulted in a telecon interview, follow-up enquiries, and a request for membership of the DAR-TG.

‘Principle Guidelines’ for Data at Risk

A follow-up to the Marseille panel discussion will be held as part of the Conference on Digital Preservation and Development of Trusted Digital Repositories (New Delhi, February 2014), when DAR-TG Panellists will be joined by Indian representatives.

A DAR-TG Workshop that will debate the combination of cultural and scientific issues of digital heritage is to be held at the University of Toronto in September 2014, and will involve archivists, librarians, IT and data- management experts as well as the DAR-TG members and other scientists in the preparation of “Principle Guidelines” to assist those embarking on this type of data-rescue project.  It is intended that this work should seed discussions that can be taken up at SciDataCon 2014 (the CODATA and WDS Conference) again in New Delhi, 2-5 November 2014.

4 thoughts on “Building Support for ‘Principle Guidelines’ for Data at Risk

  1. Damara Arrowood

    I’ve recently been made aware of vast archives of data stored on magnetic tape collected by NASA and partner organizations (including many radio telescope facilities) by colleagues currently at NASA Goddard. I’ve made significant effort to find answers within the NASA community on strategies for processing, management and open-access dissemination of this data. To date I have been given no further information other than confirmation of the archives existence.

    I’m very interested in developing and implementing a strategy to address these issues and bring this highly valuable data into the public domain. It has, of course, obvious historical importance and potentially direct relevance to current and future research in many fields.

    Might a project of this nature be an acceptable topic for conversation and potential support by Jisc though NASA is US based? I do see many ways this project could be developed as an international collaboration, at best directly with the UK Space Agency and the ESA. Perhaps the latter agencies have similar challenges that can be addressed and likely solved by participating in a global collaboration?

  2. Elizabeth Griffin

    The answers to the questions in your second paragraph are Yes, Yes and Yes! What you are in effect proposing is precisely the kind of need which only international collaboration will solve satisfactorily. Ideally, as and when the potential value of these nearly-lost data-sets eventually dawns, and becomes a regular feature of conscious thinking and planning, then the major players like NASA, ESA, CSA, JISC, etc., will be ready to add their support (both verbal and financial), rather than a noble soul like yourself having to go it alone and beg for bits of funding. One great point about data rescue is that it only has to be done once; it’s not an open-ended commitment – once digitized and stored correctly (along with born-digital data) in the public domain, heritage material will be as well safeguarded as any other digital data, and one cannot ask for more than that.

    As I hope the blog emphasizes, there are no limitations to the scope or type of heritage observations which the DAR-TG project will embrace. DAR-TG has a website (set up by library students at the University of North Carolina) which invites anyone with information about data in need of rescue to enter details in a purpose-designed GUI. Entries in the GUI will make their way into an Inventory of datasets deemed to be At Risk, thereby furnishing some idea of the scale of the global problem.

    To begin with we have restricted our conversations to data from fields which the TG itself represents, so that there is at least some understanding of the dialect and challenges of the various types of data in question. We have also tried to focus more on analogue, as opposed to already digital, materials, partly to keep the project manageable; all digital database managers will claim that their data are fragile and potentially ‘at risk’, but those data can (at least in principle) be copied, shared, migrated, etc., whereas analogue materials like photographic plates, books and other written records are unique editions, and if those unique editions become damaged by whatever disaster, a total loss of their information content results. DAR-TG does, however, recognize that in the early years of digitization extensive use was made of magnetic tapes, and not infrequently data were written to tape in a format that was not properly described or documented, and/or the files did not contain adequate metadata. so a parallel situation of peril can exist in those cases too.

    While natural disasters are always accidents waiting to happen, perhaps the worst enemy of historic scientific data is the uninformed human! In astronomy (my own field), whole archives of plates have been tossed “because nobody uses that sort of stuff any more”, and the very act of publicising the perils and raising awareness of the objectives of DAR-TG is the best antidote to counter such statements and protect the suffering data.

    There is therefore some element of urgency in achieving the goals set out by DAR-TG. And it will be with the help and notice of people like yourself that those goals will come closer.

  4. Jim Craig

    NASA specifically has issues regarding older archive data because there is a big push on the part of the security and export control people to restrict access to research data. Consequently, a lot of people feel that it’s better just to leave the data on those old IRIG tapes in the basement rather than go through all of the procedures required to make them available to someone who might be able to use the data.


