Category Archives: EOSC EDEN

Understanding contemporary digital preservation practice: the EOSC EDEN project reports survey findings

By Laura Molloy, CODATA Research Lead

With rising threats to the existence of essential data resources, and mendacious contesting of the historical record, the current moment clearly demonstrates the critical role of high-quality digital preservation practitioners, skills and services. Digital preservation is a complex and diverse profession, often underfunded and sometimes misunderstood. It is important that we understand the current digital preservation landscape as well as possible, in order to support those working around the world in the preservation professions and to provide project outputs that will be of relevance and value to them. Accordingly, CODATA is delighted to be a participant in the European Open Science Cloud (EOSC) project, ‘Enhancing Digital preservation strategies at European and National level’ (EDEN).

EDEN has published the results of a survey which was recently conducted to gather information from the digital preservation community worldwide. Our survey was specifically about the guidance to which preservation practitioners refer, and practices they use, when identifying, selecting, and appraising digital data objects for ‘long-term’ preservation.

This blog post provides an informal overview of how we went about the survey, and what we discovered. We hope this will be of interest to those working in contemporary digital preservation, including managers, practitioners, and those responsible for policy making and training within memory organisations. If you would like further detail on any aspect of this work, the full report can be downloaded at https://doi.org/10.5281/zenodo.17984753.

About the EOSC EDEN project

The EOSC EDEN project, funded by the European Commission [1], seeks to enhance digital preservation strategies at European and national levels. The project is creating a framework to identify what data are candidates for digital preservation. This involves setting standards and protocols for long-term data preservation, which will be determined through an assessment of data usage, quality, and the data’s benefits to science and society.

In addition to the framework, the EOSC EDEN project aims to develop a model for re-appraisal of data throughout its lifecycle. The model for re-appraisal will support the framework for digital preservation by ensuring that preservation efforts remain relevant over time.

The survey activity was led by Laura Molloy, CODATA research lead, who is leading EDEN Task 1.1, ‘Landscape analysis of existing frameworks, guidelines and practices for identification, selection and appraisal of data for long-term preservation’. This task contributes the majority of the landscaping activity in the project. Laura is a qualitative social science researcher by training, with experience in a number of digital preservation projects and initiatives, and has a track record in research and consultancy relating to digital decision-making and information behaviours in varied professional settings. Analytical power was added by other members of the EDEN task team, including work package leader and digital preservation expert Micky Lindlar, and quantitive analyst Maria Benauer, both of Technische Informationsbibliothek (TIB).

Survey design

Understanding contemporary digital preservation includes direct contact with as many current practitioners as possible to understand their real practices—and the reference materials that inform those practices. We also need to build communication with those working in preservation across different types of organisation and in different countries. Accordingly, the EOSC EDEN 2025 survey was carefully designed to be simple to interact with, and to make sense to digital preservation professionals across organisation types, staff levels, geographical locations, and any or no discipline focus [2].

Survey questions were arranged into four main sections:

  1. About your organisation and role;

  2. About frameworks and guidelines for identification, appraisal and selection of data for long-term digital preservation;

  3. About current practices in identification, appraisal and selection of data for long-term digital preservation;

  4. Discipline-specific requirements for long-term preservation of digital objects.

The survey ended with one further short section gathering voluntary contact details, to enable the identification of candidates for any follow-up inquiry.

The questions were a mixture of closed and open questions, i.e. those that can be answered by choosing yes or no (closed questions) or those that require a more discursive, free-text answer to be generated by the respondent (open questions). Accordingly, a mixture of qualitative and quantitative analysis was performed by the task team.

Survey respondents

We received 250 valid responses from 31 states/nations [3]. The majority of responses were from Western Europe, followed by North America, despite focused activities undertaken by the task team to solicit a more evenly-distributed global response.

The size of respondents’ organisations was approximately evenly distributed across micro/small, medium, and large sized organisations [4], each with around a third of the responses. In terms of staff level within the organisation, around two-thirds of respondents were practitioners; just under a third of respondents identified as middle management and a few identified as senior management. We received responses from eighteen organisation types, which we coded into nine wider groupings called ‘organisation classes’, as follows: Academic publisher, Archive, Digital preservation service, Library, Multifunctional, Museum or gallery, Repository, Research performing organisation, Research infrastructure, plus Other/unassigned. The most populous class was ‘archive’ with 67 responses; the least populous was ‘academic publisher’ with one.

Selected findings

There are a few selected findings that were of interest to the task team, and offer some food for thought. These are briefly set out here.

‘Long-term’ preservation

Firstly, the project itself—as well as its subsidiary work packages and tasks—frequently uses the phrase, “long-term preservation”. We were interested to note that this emerges from the data as an unstable concept. One of the most striking findings was the high proportion of respondents who are working at an organisation where there is no agreed or working definition of ‘long-term’ in the digital preservation context. Even those respondents who did have an agreed or working definition of ‘long-term’ offered a wide range of numerical definitions of what that means for them and their preservation work.

Quality checking behaviours

We were interested in investigating two sets of quality checks: quality checks upon ingest and subsequent quality checks throughout the data preservation period. We asked various questions about if and how exactly these checks are carried out. We found that a majority of respondents do carry out quality checks of various kinds upon ingest but that this drops dramatically when we examine the occurrence of subsequent quality checks throughout the preservation period. This is a complex area for analysis and we would like to investigate more through some follow-up interviewing during 2026.

Commonalities with FAIR data

We note with interest the existing connections indicated by respondents, between the
digital preservation realm and the set of ideas currently designated ‘FAIR data’. These connections appeared in two different places in the survey responses.

First, respondents were asked about their usual preservation period; that is to say, the length of time that the organisation usually initially commits to holding and maintaining a preservation copy of a given data object. Here, respondents introduced a recurring—and pretty passionate—discussion about the importance of maintaining findability and access, whatever the agreed preservation period; and we noted that maintenance of findability and access was a much more important issue for many respondents than the existence of any shared agreement about the length of the preservation period.

Second, we provided respondents with a list of frameworks, standards and guidelines that we had gathered from desk research and professional experience. These were presented as likely reference resources for practitioners when they were working on identification, selection and appraisal of digital data objects in their day-to-day work. We asked respondents to indicate whether they were aware of each document and/or used it in their preservation work. The FAIR Guiding Principles was one of these documents. Respondents reported a high level of awareness and use of the FAIR principles (ranking 4th of 15 options). This reminds us that some of the ideas now encapsulated in the FAIR principles have been, to some extent, bedrocks of preservation practice for years, and suggests that digital preservation practitioners are aware of recent events in the FAIR data movement. (It is worth noting, however, that there is no similar visibility at this time of the TRUST or CARE principles within the responses from our participants.)

Needs of designated communities/threats to FAIRness of data over time

We asked a question about the extent to which the respondent understands any unmet needs of their organisation’s designated community [5]. Elsewhere in the survey, we also asked a question on the respondent’s view of threats to the ‘FAIRness’ of their preserved data over time. Some common themes emerged from the responses to these two questions. This suggests that these common themes may be issues of cross-cutting importance for the digital preservation practitioner community.

The most frequently highlighted issues here were: issues around sensitive / protected data; the challenges of data volume; and issues around access provision. Two of the top three designated community needs—data volume and access issues—recur in the top answers around threats to FAIR over time. Sensitive data issues were flagged in three responses, and the other designated community needs—long-term provision of service; lack of useful policy/directive; software preservation; provenance issues and various format problems—also all recur at low rates in the threats to FAIR over time. This is not particularly surprising as these are clearly frequently experienced challenges in the practice of preservation. But it is interesting to see that they are considered by respondents both from the perspective of directly meeting the needs of the community i.e. user-centred approaches, and also the arguably more theoretical perspective introduced when considering keeping digital data objects FAIR. Ultimately, though, FAIR data are data that meet user needs. It is a useful piece of validation that these themes recur in the responses to these two questions.

To conclude…

The EDEN task team is delighted by the response to the survey and thanks all participants.

Next steps within the task include some follow-up interviewing with consenting respondents to further explore the relationship between different information behaviours: for example, how quality checking is monitored; whether designated community needs are monitored and if so whether this impacts preservation activity; the role of data policy; and the role of organisational acquisition strategy. This work will be reported upon by the end of 2026.

In addition, certain findings from this enquiry are potentially useful for future work by CODATA, specifically the upcoming EU-funded project, ‘Developing and Implementing the Cross-Domain Interoperability Framework for EOSC’ (CDIF4EOSC), and the CODATA Task Group on Research Data Quality Management.

A full breakdown of data analysis and the findings we have heretofore identified is beyond the scope of this blog post, and can be found in the full report which is freely available online at https://doi.org/10.5281/zenodo.17984753. Any questions or feedback can be directed to the task leader at laura @ codata.org. For more information about the EOSC EDEN project please visit the project website, https://eden-fidelis.eu/.

 

[1] EDEN has received funding from the EU’s Horizon Europe research and innovation
programme under Grant Agreement no. 101188015.

[2] Although we note the use of English as the primary language of the survey may have been a limiting factor for some potential respondents.

[3] As defined by the United Nations member states available at the time of survey publication (May 2025).

[4] As defined by the European Commission.

[5] “Designated community” is defined in the EDEN Milestone 1.1 report (https://doi.org/10.5281/zenodo.16992452), based upon the OAIS definition (http://www.oais.info/), as: “A group of users, now or in the future, who can understand and use the Objects preserved. The designated community is whom the Objects are preserved for. It can be made of several user communities and the definition can change over time.”