This Task Group‘s objective is to play a central role in improving scientific data access and reusability. Specifically, our main focus will be on full-stack data rescue, which includes a wide array of risks for completely dark data, non-digital datasets, and data born digital. This group will
build on the original DAR-TG charter and its focus on raising awareness of data susceptible to loss throughout the research life cycle. Archivists and researchers alike would benefit from having readily available accepted principles, references, and guidelines to expose and make accessible a wide range of data and relevant services.
This task group will focus on collection and development of such principles and guidelines, in consultation with subject matter and domain experts, in order to form and continually update a common decision framework that will improve the access to scientific data. This framework will include a classification matrix to help assess a particular dataset’s susceptibility to loss in the future. Additionally, this task group may eventually evolve into a consultation entity, available to institutions during their decision-making processes regarding the prioritisation of specific tasks for their data curation activities.
Finally, our task group intends to address a new and unexpected type of data rescue resulting from observations and/or survey data taken in regions populated by indigenous peoples, primarily in low and middle income countries. Rescuing these data involves ensuring that this information is available to and usable by the communities themselves as well as local researchers and policy makers.
Names of the co-chairs
- Dirk Fleischer, Data and Information Management for Kiel Marine Science, Christian-Albrechts University in Kiel Kiel, Germany
- Stephen Diggs, Technical Director, Hydrographic Data Office (CCHDO), Scripps Institution of Oceanography UC San Diego, California USA
- Denise Hills, Program Director, Energy Investigations, Geological Survey of Alabama Tuscaloosa Alabama, USA
While only a portion of international scientific data meet the definition of “big data,” there has been significant investment in data acquisition and management, and reliable projections predict that this expenditure will more than double in the next decade. It is imperative that strategic data stewardship is approached from the perspective of Return on Investment (ROI) and begin with an accurate quantification of every aspect of data acquisition, use, and on-going curation.
Big data market size revenue forecast worldwide
from 2011 to 2027 (in billion U.S. dollars)
Although the costs of data generation and management has historically required significant investments, increasing data access and reusability need not be proportionately resource-intensive. Improvements in data discovery and access are comprised of strategic tasks that must be
in-step with each scientific domain’s most up-to-date common sense best practices.
Preliminary assessments of the state of unavailable scientific data indicate that we are in a race against time and will need to implement temporary solutions to reduce data loss with an eye toward more robust long-term solutions. As long as there are institutions (and individual researchers) lacking the resources or knowledge to adequately identify and preserve susceptible data, there is a non-zero risk of permanent data loss.
However, library staff at many research institutions lack current and/or domain-specific information to help update their services to meet the data needs of the 21st century scientist. These professionals comprise the resource that many senior researchers, nearing the end of their careers, rely upon to care for their legacy outputs. Some scientists may not even have access to (or be aware of) an appropriate archivist within their institution. Thus, archivists and researchers would benefit from having accepted principles, references, and guidelines to expose and make accessible a wide range of data and relevant services.
Especially as active researchers (and the library staff that support them) cannot always be up to speed with current (domain-specific) data policies and practices, this task group will become their de facto resource for guidelines on identifying and preserving susceptible data. The guidelines and practices this task group will collect and develop will move beyond simple data rescue to improved comprehensive data management for in-progress and future research projects, and will also provide institutions and researchers the tools necessary to move towards FAIR-aligned data practices (cf. COPDESS’s mission to elevate open, available, and useful data as the standard practice and policy).
Our task group is conscious that many past data efforts have not taken into account the needs of indigenous peoples, primarily in low and middle income countries. Rescuing data collected in regions populated by indigenous peoples involves ensuring that this information is available to and usable by local researchers and policy makers, as well as available and usable by the communities themselves. Understanding local needs is key to long-term collaboration and success. We intend to address this distinct type of data rescue by working with liaisons to and experts working with such communities.
Our task group will align its efforts with similar groups in order to maintain the continuity of the work. Specifically, we have had detailed discussions with RDA’s Data Rescue Interest Group and ESIP’s Data Stewardship Committee and intend to continue to work closely with these groups and others.
Task group outputs
- Development of a Data Risk Assessment Matrix (in collaboration with ESIP’s Data Stewardship Committee)
- White paper and/or articles in CODATA Data Science Journal https://www.codata.org/dsj or other refereed Journals
- Assist expansion of the beta version repository finder tool (developed by re3data.org and the Enabling FAIR data project) to broaden the current scope beyond Earth and space sciences
- Sessions at AGU, EGU, RDA, CODATA Research Data Science Schools to disseminate results and solicit feedback
- Strategic alliances with ESIP, RDA, Belmont Forum E-Infrastructures, FORCE11 and other similar national and global data stewardship working groups, including but not limited to:
Task Group Activities
Face-to-face interactions are our best chance to achieve successful and concrete outcomes. In addition to being able to make progress on specific tasks, holding Task Group meetings in conjunction with other organizations has the positive side effect of sharing the work with those organizations, as CODATA cannot and should not bear sole responsibility for progress on broad topics such as scientific data access and interoperability.
The co-chairs have extensive experience conducting online meetings which are best used to help disseminate information (webinars, presentations), coordinate activities, and maintain momentum on a limited number of action items. Therefore, bimonthly virtual meetings will supplement in-person interactions at meetings of opportunity.
We plan to take advantage of at least six (and possibly up to eight) opportunities for this Task Group (or portions thereof) to meet in the next 18 months. Please refer to the table below for details. Financial support for these activities are discussed in section 10 of this document.
- 09 Nov 2018, SciDataCon / RDA Plenary (Botswana):- Steve Diggs Diggs, Fleischer, Parsons, Cox, Ramdeen Initial in-person meeting, coordinate with RDA’s Data Rescue IG
- 03–14 Dec 2018, CODATA-RDA School of Research Data Science (São Paulo, BRAZIL):- Steve Diggs Diggs, Jones + other instructors Dissemination of original DAR-TG recommendations, needs assessment gathering through student/instructor interaction for IDAR-TG.
- 10–14 Dec 2018 AGU Fall Meeting (Washington, DC, USA):- Denise Hills Hills, Fleischer, Ramdeen, Khan, Mayernik Presentation: disseminate information on IDAR-TG, liaise with sections focused in-part on data rescue, management, and assessment
- 09–11 Jan 2019, ESIP Winter Meetings (Bethesda, MD, USA):- Steve Diggs Diggs, Hills, Mayernik, Hou, Parsons, Lehnert, Ramdeen Presentation: disseminate information on IDAR-TG, liaise with Data Stewardship Cluster, work on Data Risk Matrix
- 02–04 Apr 2019, RDA, P13 (Philadelphia, PA, USA):- Diggs, Fleischer, Hills All Liaise with Data Rescue IG, working meeting to refine risk matrix, needs assessment for the research community; F2F meeting of IDAR-TG
- 07–12 Apr 2019, EGU (Vienna, Austria):- Dirk Fleischer Diggs, Parsons, Wyborn, Fleischer Presentation: disseminate information on IDAR-TG, liaise with sections focused in-part on data rescue and archive
- 10–14 May 2019, SOOS Southern Ocean Data Hackathon (Incheon, KOREA):- Steve Diggs, Diggs + SOOS Science Participants; First test of implementing IDAR-TG endorsed identifier and metadata assignment recommendations to rescued data during the event.
- 16–19 Jul 2019, ESIP Winter Meeting (TBD, tentatively San Diego, CA), Diggs, Hills Diggs, Hills, Mayernik, Hou, Parsons, Lehnert, Ramdeen Presentation: disseminate information on IDAR-TG, liaise with Data Stewardship Cluster, finalise Data Risk Matrix prior to wider distribution