“I’m a data scientist. One needs to be aware that data is important to do science. But it comes with lots of issues, to make good quality results. It’s not just about collecting data and using it. Any data-driven solutions you try to develop, you need to understand the users and their different roles. A lot of work on the data is not just about the technology. It’s also about the social aspects. It’s not just about setting up a system and saying, ‘The scientists will use it.’
What I learned from my computing science experience is that every domain is different. If you want to develop a computing solution for a domain, you need to get familiar with the user environment, workflows, best practices, language and so on, and it’s important to get familiar with this before coming up with a solution. Every domain I have worked in, I had to get familiar with the practice, so the users will see the computing solution as integrated, and they will use it.”
“I‘m an ethnographer. Well, I’m not only an ethnographer, but the heftiest part of my dissertation work was ethnography. And as one who also has rigorous training in engineering and other natural science fields – I was surprised at how much of my research design required transformational change.
While I did the research – as I collected data – I changed what my hypotheses were, what I’d use the data for and what outcomes I’d obtain from analysing the data. It was simultaneously confusing and exciting, but eventually I was much prouder of the outcomes than I would have been had I stuck to the plan.
It was a lesson in valuing methodological adaptation and change. As researchers, we don’t know everything, and more people in science should have, and value, that experience.”
“I’m trying to create awareness for researchers on opening up their own data. Before working [on open data] I didn’t know anything about it. If people open up their data there is so much more that can be done. You can share ideas, review someone else’s data and find something that they didn’t find before. I feel that’s very important.
I feel when people open up their data, it’s no longer people competing with each other – they’re helping each other to make the world a better place. If you keep something to yourself you might not see everything in it that might improve the world. But if you share it with peers, someone else might discover something that might make the world a better place.
We have the data but [currently] we’re not the owners of the data. Africa needs to step up. We need to be partners in research. We need to open up our data and also to own our data.”
“What motivates me to keep going is teaching people who are going to keep this going after us. Managing data will find its place in the world. I don’t mean analytics, I mean taking care of the data so that people can run legitimate analytics.
It annoys me just now – a lot of these data science and data analytics programs, they’re all about statistics, visualisation, analysis, but very little about actually curating the data underneath. Not to say that data curators don’t need to know a little bit about analysis but people who do data science in the business environment, they often don’t know much about curation. People working for businesses, they complain that they spend 80% of their time cleaning data and without that, the data wasn’t usable. But I feel like saying, ‘If you hired data curators you wouldn’t have to deal with that problem!’”
“I’m an archivist who does digital preservation in a library and I’m very aware of the opportunities and challenges that happen in that context. When we talk about inclusion, we need to remember professional and technical inclusion, too. We don’t leverage our cumulative power enough. Archives, libraries, digital preservation, digital curation, data science: we need to think what we all bring to the table and how we can put the pieces together. If we don’t do that, we end up bumping into each other and missing opportunities.
I recently marked 30 years of working with data. I’ve been a curator, preserver, creator and user. I believe strongly in the continuum of data to information to knowledge to wisdom – we often stop at data and that’s short-sighted. Data is the raw material that fuels we what understand and share, and we don’t make nearly enough of its potential.
I really like the kinds of stories that people are able to tell with various types of data. When people think about what data can be, they often stop at structured, quantitative data, but there is a a broad mix of the various content that we can consider to be data. We have an opportunity to innovate if we come together to develop a shared understanding of data services and practice, and collaborate with shared objectives.”
“My story with data is funny. A year and half ago I didn’t know the term ‘big data’ exists. I couldn’t sleep one night in Cairo and I was reading online, and I found an article about big data. I had no idea what it was. So it was like, ‘This is interesting. I should be learning about this.’
So I was self-learning from scratch, so I think the passion started at the first sight. I’m so glad I didn’t sleep this night – because here I am studying data because of not sleeping!
I’m passionate about what we can do with data. It’s something very precious. It’s there and no one is using it so let’s use it. Because I have data, I can do things other people can’t. I’m still learning because data is complicated. But when you have them, data gives you power that other people don’t have.”
“I’m excited that people are now starting to think about data sharing. For the last few years it’s been me, as the institutional data manager, going to people and saying, ‘You should make your data available!’ Now people are getting in touch and saying they want to do it, because they’re recognising they can get more stuff published that they can get recognition for.
It’s also good that we’re getting more than just the raw or aggregated data – we’re also getting the survey tools, the Stata code and the files for the processing scripts for how the data is analysed. It’s exploding out into all the different stages of research. If you’re thinking about reproducibility of research, you still only see tiny snapshots of that. I’d like to do more about that: my frustration is that we don’t have software to document all stages of the research process.
A lot of those research outputs are useful but also ephemeral. If you wanted to reapply a questionnaire, you’d have to do an update of it 2 or 3 years down the line. Research approaches change, the language changes and so on. But you could actually go back and do a comparison about how interviewing has changed over a specific time period – as long as we start managing those research outputs too, alongside the data and publications.”
“In my previous life as an academic, I always liked interdisciplinary work: to come at things from a slightly sideways perspective. But in this area, I get to encounter more than most people do – collections, ideas, researchers, people, stories … I get to discover everything from every different area of knowledge, from lots of different perspectives. The data itself is obviously really interesting but it’s what goes into the creation of that data, and what people then do with that data – that’s what’s really fascinating to me.
When people ask me, ‘What do you do?’, I’m still not sure how best to describe it. Whenever someone asks, I give a different answer, but it doesn’t actually capture what the day-to-day work is about, which is the exchange of social and cultural knowledge. I think that’s the most appealing thing to me. There’s always something new to find out about, and this central thing that we call ‘data’ is a conduit into discovery of all kinds of stories and narratives. It’s a window into lots of different worlds.”
I’m not a data scientist but I know how to read and fiddle with code. This is what drives me – I want to understand and know something practically, not just by reading about it but by getting first-hand experience in collecting data, doing things with it, manipulation. I enjoy this and find it valuable. I do theory about data practice, so I’m interested in asking what data does to knowledge practices, but I’m looking at it as a philosopher rather than anything else. I’m interested in how data can be used to tell stories, but want to take this one step further. How do we use data to make arguments? I’m interested in how we can move to a critical way of looking at argumentation – how we can use data as evidence, to convince, to tell stories. I’m asking what is ‘good enough’ knowledge, what is ‘responsible’ knowledge, what is ‘valuable’ knowledge? What are the ethical considerations about data when we use it to make decisions?
“Still, I’m inspired by the fact that the field is cross-disciplinary. To be able to talk about digital preservation in a holistic way you need data producers and data consumers including people from information sciences, library scientists and researchers. With every domain we need to understand a whole new idea of how data is produced and consumed and the use cases for the value of data. It never gets boring. There will always be work. And if I have a question about a file format or metadata problem I can ask colleagues in New Zealand or the States or Scotland or the Netherlands and they know what I’m talking about. I love that. To me it’s like a cool kids’ domain!”