When I think about how we can better support the sharing and preservation of research data, I think of the challenge we have in moving beyond our individual project-based approaches. Of course, it can be important for discovery to build specialized approaches, but we need to think about changing the entire practice and culture of research. It’s a system-level problem.
And open research data isn’t the same as Open Access publishing. The social justice aspects of open data are not the same as they are for public access to research articles. The Open Data movement can’t move forward on that argument alone. We have to start thinking about how to articulate our own vision for why openness matters. Reuse and auditability are key to our argument. So we need to make sure reuse of datasets is possible, and communicate why that’s important.
“I get passionate when we can engender system change, and that’s often through policy change. Sometimes that’s top down, but it can also often be bottom-up – it feels good when we can make change by having a community come together.
It’s great to see the data community continuing to broaden, particularly to embrace the importance of software in enhancing data analysis.”
“The things that made me interested in data twenty-five years ago are the same things that make me interested in it now. It’s the way structures and narratives are going to define our culture and who we are. I was interested initially in the historical perspective: how is data going to change the subtlety about how we understand past cultures? And how future generations are going to access data, manipulate it and study it. Very few people were interested in that, and that was exciting. Now I’m a bit scared of it. It’s increasingly clear that we can use data in a contemporary context for social evils.
Data – and the systems that store it – can be ugly, but they can also be beautiful. Some day, people are going to be interested in the beauty of system architectures and the beauty of database design – or the ugliness. It’s like studying a Gothic cathedral or a contemporary city: the architecture defines how we feel and the narrative that plays out in our lives.
I see emerging possibilities in machine learning, in the blockchain and in other areas. I share an understanding of the risks of data. And I believe that what kind of research you actually do – and how you enable others to be creative to do things – these are equally important. More focus needs to go on developing the community and helping our colleagues to progress and excel. The ways forward in digital curation and data presentation are very much going to come from human collaboration and not from one lone person who’s thought of a technical solution that no-one has thought of before. This is not the field of lone scholars – this is the field of general community effort.”
“I’m a data scientist. One needs to be aware that data is important to do science. But it comes with lots of issues, to make good quality results. It’s not just about collecting data and using it. Any data-driven solutions you try to develop, you need to understand the users and their different roles. A lot of work on the data is not just about the technology. It’s also about the social aspects. It’s not just about setting up a system and saying, ‘The scientists will use it.’
What I learned from my computing science experience is that every domain is different. If you want to develop a computing solution for a domain, you need to get familiar with the user environment, workflows, best practices, language and so on, and it’s important to get familiar with this before coming up with a solution. Every domain I have worked in, I had to get familiar with the practice, so the users will see the computing solution as integrated, and they will use it.”
“I‘m an ethnographer. Well, I’m not only an ethnographer, but the heftiest part of my dissertation work was ethnography. And as one who also has rigorous training in engineering and other natural science fields – I was surprised at how much of my research design required transformational change.
While I did the research – as I collected data – I changed what my hypotheses were, what I’d use the data for and what outcomes I’d obtain from analysing the data. It was simultaneously confusing and exciting, but eventually I was much prouder of the outcomes than I would have been had I stuck to the plan.
It was a lesson in valuing methodological adaptation and change. As researchers, we don’t know everything, and more people in science should have, and value, that experience.”
“I’m trying to create awareness for researchers on opening up their own data. Before working [on open data] I didn’t know anything about it. If people open up their data there is so much more that can be done. You can share ideas, review someone else’s data and find something that they didn’t find before. I feel that’s very important.
I feel when people open up their data, it’s no longer people competing with each other – they’re helping each other to make the world a better place. If you keep something to yourself you might not see everything in it that might improve the world. But if you share it with peers, someone else might discover something that might make the world a better place.
We have the data but [currently] we’re not the owners of the data. Africa needs to step up. We need to be partners in research. We need to open up our data and also to own our data.”
“What motivates me to keep going is teaching people who are going to keep this going after us. Managing data will find its place in the world. I don’t mean analytics, I mean taking care of the data so that people can run legitimate analytics.
It annoys me just now – a lot of these data science and data analytics programs, they’re all about statistics, visualisation, analysis, but very little about actually curating the data underneath. Not to say that data curators don’t need to know a little bit about analysis but people who do data science in the business environment, they often don’t know much about curation. People working for businesses, they complain that they spend 80% of their time cleaning data and without that, the data wasn’t usable. But I feel like saying, ‘If you hired data curators you wouldn’t have to deal with that problem!’”
“I’m an archivist who does digital preservation in a library and I’m very aware of the opportunities and challenges that happen in that context. When we talk about inclusion, we need to remember professional and technical inclusion, too. We don’t leverage our cumulative power enough. Archives, libraries, digital preservation, digital curation, data science: we need to think what we all bring to the table and how we can put the pieces together. If we don’t do that, we end up bumping into each other and missing opportunities.
I recently marked 30 years of working with data. I’ve been a curator, preserver, creator and user. I believe strongly in the continuum of data to information to knowledge to wisdom – we often stop at data and that’s short-sighted. Data is the raw material that fuels we what understand and share, and we don’t make nearly enough of its potential.
I really like the kinds of stories that people are able to tell with various types of data. When people think about what data can be, they often stop at structured, quantitative data, but there is a a broad mix of the various content that we can consider to be data. We have an opportunity to innovate if we come together to develop a shared understanding of data services and practice, and collaborate with shared objectives.”
“My story with data is funny. A year and half ago I didn’t know the term ‘big data’ exists. I couldn’t sleep one night in Cairo and I was reading online, and I found an article about big data. I had no idea what it was. So it was like, ‘This is interesting. I should be learning about this.’
So I was self-learning from scratch, so I think the passion started at the first sight. I’m so glad I didn’t sleep this night – because here I am studying data because of not sleeping!
I’m passionate about what we can do with data. It’s something very precious. It’s there and no one is using it so let’s use it. Because I have data, I can do things other people can’t. I’m still learning because data is complicated. But when you have them, data gives you power that other people don’t have.”
“I’m excited that people are now starting to think about data sharing. For the last few years it’s been me, as the institutional data manager, going to people and saying, ‘You should make your data available!’ Now people are getting in touch and saying they want to do it, because they’re recognising they can get more stuff published that they can get recognition for.
It’s also good that we’re getting more than just the raw or aggregated data – we’re also getting the survey tools, the Stata code and the files for the processing scripts for how the data is analysed. It’s exploding out into all the different stages of research. If you’re thinking about reproducibility of research, you still only see tiny snapshots of that. I’d like to do more about that: my frustration is that we don’t have software to document all stages of the research process.
A lot of those research outputs are useful but also ephemeral. If you wanted to reapply a questionnaire, you’d have to do an update of it 2 or 3 years down the line. Research approaches change, the language changes and so on. But you could actually go back and do a comparison about how interviewing has changed over a specific time period – as long as we start managing those research outputs too, alongside the data and publications.”