Category Archives: Data Science Journal

Posts relating to the data science journal

Publishing an article in CODATA Data Science Journal

This article was first published by Ms. Neema Mduma https://neylicious.github.io/ml/2019/05/11/paper.html – Neema is an alumni of the CODATA-RDA School of Research Data Science.

In early 2017, I was privileged to work as a researcher in the Dropwall project (by Rose Funja) which was among the winning project of the Data for Local Impact Innovation Challenge (DLIIC). The main focus of the project was to develop a tool that will help fighting dropout among secondary school girls. The findings from this project show a high rate of dropout among secondary school students particularly girls, and coincide with reports from other studies which show that school dropout is a big challenge in developing countries. On addressing this problem, machine learning techniques has gained much attention in recent years. However, most of the work has been carried out in developed countries, there are only a handful of studies conducted in developing countries on school dropout using machine learning techniques with the consideration of local context and data imbalance problem. This motivated me to continue working (in my PhD) on school dropout using machine learning.

In August 2018, I attended a CODATA-RDA Research Data Science Summer School which was held at the Abdus Salam International Centre of Theoretical Physics (ICTP) in Trieste, Italy. The aim was on building competence in data analysis and security for participants from all disciplines and backgrounds from Sciences to Humanities. The level of engagements and interactions between participants and instructors was outstanding. We were introduced to various opportunities (by The Executive Director of CODATA, Dr. Simon Hodson) such as CODATA Data Science Journal where I later managed to publish the breathtaking findings from the Dropwall project titled A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction.

May 2019: Publications in the Data Science Journal

May 2019:  Publications in the Data Science Journal

Title: Interdisciplinary Comparison of Scientific Impact of Publications Using the Citation-Ratio
Author: Arthur R. Bos, Sandrine Nitza
URL: http://doi.org/10.5334/dsj-2019-019
Title: Diversity of Woody Species in Djamde Wildlife Reserve, Northern Togo, West Africa
Author:Tchagou Awitazi, Raoufou Radji, Kotchikpa Okoumassou
URL: http://doi.org/10.5334/dsj-2019-018
Title: A Generic Research Data Infrastructure for Long Tail Research Data Management
Author: Atif Latif, Fidan Limani, Klaus Tochtermann
URL: http://doi.org/10.5334/dsj-2019-017
Title: Time Series Prediction Model of Grey Wolf Optimized Echo State Network
Author: Huiqing Wang, Yingying Bai, Chun Li, Zhirong Guo, Jianhui Zhang
URL: http://doi.org/10.5334/dsj-2019-016
Title: Fostering Data Sharing in Multidisciplinary Research Communities: A Case Study in the Geospatial Domain
Author: Martina Zilioli, Simone Lanucara, Alessandro Oggioni, Cristiano Fugazza, Paola Carrara
URL: http://doi.org/10.5334/dsj-2019-015

April 2019: Publications in the Data Science Journal

April 2019:  Publications in the Data Science Journal

Title: A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction
Author: Neema Mduma, Khamisi Kalegele, Dina Machuve
URL: http://doi.org/10.5334/dsj-2019-014
Title: GeoSimMR: A MapReduce Algorithm for Detecting Communities based on Distance and Interest in Social Networks
Author: Zaher Al Aghbari, Mohammed Bahutair, Ibrahim Kamel
URL: http://doi.org/10.5334/dsj-2019-013
Title: Building an International Consensus on Multi-Disciplinary Metadata Standards: A CODATA Case History in Nanotechnology
Author: John Rumble, John Broome, Simon Hodson
URL: http://doi.org/10.5334/dsj-2019-012

CODATA is pleased to announce Mark Parsons as the new Editor-in-Chief of the Data Science Journal

In his blog post, Mark writes: ‘I am especially interested in helping DSJ build its niche as an influential journal of the ‘science of data’ in the sense that CODATA described it decades ago. We need more fora that encourage dialog across research and practice to understand all the issues around the socio-technical work necessary for data to be findable, accessible, interoperable, reusable, ethical, secure, etc.’ …

‘I have been a member of the DSJ editorial board since the journal moved to Ubiquity Press, and I have been impressed at how Sarah Callaghan and other editors have worked to increase the journal’s quality. I want to continue this momentum. I want to further bolster the review quality and also raise the possibility of open reviews. The nature of DSJ is that it often attracts submissions and requires reviews from practitioners who have much less of a mandate to publish than researchers. I believe practitioners should be encouraged to contribute (with research as well as practice papers), so we should do what we can to recognize and model excellent contributions in this area. …

‘Thanks to Sarah’s great work, DSJ has a bright future as submissions continue to increase in number and quality. DSJ was ahead of its time when it was founded in the 1990s. I am eager to explore how it can continue to push important conversations forward. I welcome all your ideas. Please tell me what you think. Better yet, tell the community through a submission to DSJ!

Read more at http://codata.org/blog/2019/04/29/mark-parsons-joins-codata-as-editor-in-chief-data-science-journal/

Mark replaces Sarah Callaghan, who has served since 2015, when the Data Science Journal was moved to its current platform with Ubiquity Press.

Sarah writes:

‘In my four year tenure, I am very proud of the fact that 135 papers have been published, along with 6 Special Collections with another 5 Special Collections in the pipeline. The journal has grown more popular and is steadily publishing research that is more impactful as time goes on, and this is a testament to the hard work of all involved – including our reviewers and authors.

‘It is time for me to hand over the role of EiC to another, and it is with no small amount of sadness that I do so. Being EiC has been incredibly rewarding (and occasionally infuriating) and I have learned a great deal from it. I am very pleased to know that Mark Parsons is taking over the role, and know that the journal will be in safe, knowledgeable hands.

‘It only remains for me to say my farewells and thank yous. Thank you to the authors, without whom there would be no articles to publish. A thousand thank yous to all my editors, reviewers, colleagues and friends – your efforts on behalf of the journal are deeply, deeply appreciated, as is your wisdom and expertise. I wish you all the very best for the future, and look forward to reading more excellent papers published in the DSJ!’

Read more at http://codata.org/blog/2019/04/29/so-long-and-thanks-for-all-the-fish-a-farewell-from-outgoing-data-science-journal-editor-in-chief-sarah-callaghan/

Growing the Conversation on the Science of Data

Image CC-BY-NC Laura Molloy @LM_HATII from the art intervention series ‘Humans of Data’ http://codata.org/blog/category/humans-of-data/

Mark Parsons joins CODATA as Editor-in-Chief, Data Science Journal

I am honored and excited to take on the role of Editor in Chief for the Data Science Journal.

I have had a bit of history with DSJ. One of my earliest peer-reviewed papers was published with Ruth Duerr in DSJ (Parsons and Duerr 2005). I vividly remember hurrying to make revisions in Costa Rica before heading offline for several weeks. I’d still like to meet one of the reviewers (perhaps I have) who made really helpful comments on how to organize and present the paper to get my points across in a more rigorous and impactful way. I was a data practitioner, not a researcher, and was largely unschooled in formal scientific writing. The guidance was most valuable, and the paper still gets cited now and again.

Years later, I and Peter Fox published what was one one of my most controversial and influential papers (Parsons and Fox 2013). This time, DSJ allowed me to publish after an unconventional public review process involving reams of open review comments from more than two-dozen people.

In short, DSJ has been a catalyst for my career. So I am eager to help foster the journal’s growth and influence and maybe help a few more data scientists along their way.

I am especially interested in helping DSJ build its niche as an influential journal of the ‘science of data’ in the sense that CODATA described it decades ago. We need more fora that encourage dialog across research and practice to understand all the issues around the socio-technical work necessary for data to be findable, accessible, interoperable, reusable, ethical, secure, etc.

I have been a member of the DSJ editorial board since the journal moved to Ubiquity Press, and I have been impressed at how Sarah Callaghan and other editors have worked to increase the journal’s quality. I want to continue this momentum. I want to further bolster the review quality and also raise the possibility of open reviews. The nature of DSJ is that it often attracts submissions and requires reviews from practitioners who have much less of a mandate to publish than researchers. I believe practitioners should be encouraged to contribute (with research as well as practice papers), so we should do what we can to recognize and model excellent contributions in this area.

While improving the content of DSJ, we should also continue to modernize its presentation. We need to actively consider machine-readable papers and content negotiation for both the papers and the metadata. Much like at its founding, DSJ needs to advance the whole concept of scholarly communication.

Thanks to Sarah’s great work, DSJ has a bright future as submissions continue to increase in number and quality. DSJ was ahead of its time when it was founded in the 1990s. I am eager to explore how it can continue to push important conversations forward. I welcome all your ideas. Please tell me what you think. Better yet, tell the community through a submission to DSJ!

So Long, and Thanks for All the Fish’: a farewell from outgoing Data Science Journal Editor-in-Chief, Sarah Callaghan

Back in early 2015, I was approached at a coffee break at a conference, and invited to take on the role of Editor-in-Chief of the Data Science Journal. This was a little bit of a surprise, I will confess, as my previous academic journal experience had been as an associate editor, along with some projects working on data citation and data publishing. The opportunity was too good to resist, however, and with the support of my employer CEDA  I was very pleased to take on the role.

My tenure as EiC also coincided with the move of the journal to its current platform on Ubiquity Press, and came with it the need to appoint a new editorial board, develop a new scope and guidance, collate a new reviewer database, and the other minutiae of re-launching an academic journal. All these things were achieved with the help of my colleagues in the editorial board and section editors, along with the help and support of the Ubiquity Press staff and the CODATA Executive Committee.

In my four year tenure, I am very proud of the fact that 135 papers have been published, along with 6 Special Collections with another 5 Special Collections in the pipeline. The journal has grown more popular and is steadily publishing research that is more impactful as time goes on [https://www.scimagojr.com/journalsearch.php?q=4700152809&tip=sid], and this is a testament to the hard work of all involved – including our reviewers and authors.

It is time for me to hand over the role of EiC to another, and it is with no small amount of sadness that I do so. Being EiC has been incredibly rewarding (and occasionally infuriating) and I have learned a great deal from it. I am very pleased to know that Mark Parsons is taking over the role, and know that the journal will be in safe, knowledgeable hands.

It only remains for me to say my farewells and thank yous. Thank you to the authors, without whom there would be no articles to publish. A thousand thank yous to all my editors, reviewers, colleagues and friends – your efforts on behalf of the journal are deeply, deeply appreciated, as is your wisdom and expertise. I wish you all the very best for the future, and look forward to reading more excellent papers published in the DSJ!

Sarah

February – March, 2019 Publications in the Data Science Journal and new Special Collections

February-March 2019:  Publications in the Data Science Journal and new Special Collections

Title: Research of LOB Data Compression and Read-Write Efficiency in Oracle Database
Author: Jianjun WangYingang Zhao, Gaochuan Liu
URL: http://doi.org/10.5334/dsj-2019-008
Title: Bringing Citations and Usage Metrics Together to Make Data Count
Author: Helena Cousijn, Patricia FeeneyDaniella LowenbergEleonora PresaniNatasha Simons
URL: http://doi.org/10.5334/dsj-2019-009
Title: The Time Efficiency Gain in Sharing and Reuse of Research Data
Author: Tessa E. Pronk
URL: http://doi.org/10.5334/dsj-2019-010
Title: Intelligent Infrastructure, Ubiquitous Mobility, and Smart Libraries – Innovate for the Future
Author:
  Yi Shen
URL: http://doi.org/10.5334/dsj-2019-011

Call for Nominations and Applications: Editor-in-Chief, Data Science Journal, Deadline 14 April

The Data Science Journal is currently accepting nominations and applications to become the Editor-in-Chief of the journal: https://datascience.codata.org/

Applications can be made through the Google form at https://goo.gl/forms/ey60x1N2jO9YM1rY2

The deadline for applications is 12 midnight GMT on Sun 14 April. Read More

Articles are appearing in two new Special Collections in the Data Science Journal.

Göttingen-CODATA RDM Symposium 2018

This special collection contains selected papers from the Göttingen-CODATA RDM Symposium 2018: the critical role of university RDM infrastructure in transforming data to knowledge: https://datascience.codata.org/collections/special/gottingen-codata-rdm-symposium/

Guest editors:
  • Simon Hodson
  • Jan Brase
  • Michael Witt
  • Liz Lyon
  • Devika P. Madalli

Research Data Alliance Results

This collection contains papers documenting research results and outcomes stemming from the Research Data Alliance (RDA) community and efforts: https://datascience.codata.org/collections/special/research-data-alliance-results/

Guest editors:

  • Leonardo Candela, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy
  • Donatella Castelli, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy
  • Emma Lazzeri, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy
  • Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy

January Publications in the Data Science Journal and new Special Collections

January Publications in the Data Science Journal and new Special Collections
Articles are appearing in two new Special Collections in the Data Science Journal.

Göttingen-CODATA RDM Symposium 2018

This special collection contains selected papers from the Göttingen-CODATA RDM Symposium 2018: the critical role of university RDM infrastructure in transforming data to knowledge: https://datascience.codata.org/collections/special/gottingen-codata-rdm-symposium/

Guest editors:
  • Simon Hodson
  • Jan Brase
  • Michael Witt
  • Liz Lyon
  • Devika P. Madalli

Research Data Alliance Results

This collection contains papers documenting research results and outcomes stemming from the Research Data Alliance (RDA) community and efforts: https://datascience.codata.org/collections/special/research-data-alliance-results/

Guest editors:

  • Leonardo Candela, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy
  • Donatella Castelli, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy
  • Emma Lazzeri, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy
  • Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, Pisa, Italy

Articles published in January 2019

Title: Text Mining and Data Information Analysis for Network Public Opinion
Author: Yan Hu
URL: http://doi.org/10.5334/dsj-2019-007
Title: Expanding the Research Data Management Service Portfolio at Bielefeld University According to the Three-pillar Principle Towards Data FAIRness
Author: Jochen Schirrwagen, Philipp Cimiano, Vidya Ayer, Christian Pietsch, Cord Wiljes, Johanna Vompras, Dirk Pieper
URL: http://doi.org/10.5334/dsj-2019-006
Title: Supporting the Interdisciplinary, Long-Term Research Project ‘Patterns in Soil-Vegetation-Atmosphere-Systems’ by Data Management Services
Author: Constanze Curdt
URL: http://doi.org/10.5334/dsj-2019-005
Title: Implementing in the VAMDC the New Paradigms for Data Citation from the Research Data Alliance
Author:
Carlo Maria Zwölf, Nicolas Moreau, Yaye-Awa Ba, Marie-Lise Dubernet
URL: http://doi.org/10.5334/dsj-2019-004
Title: Data Discovery Paradigms: User Requirements and Recommendations for Data Repositories
Author: Mingfang Wu, Fotis Psomopoulos, Siri Jodha Khalsa, Anita de Waard
URL: http://doi.org/10.5334/dsj-2019-003
Title: Additions to the Last Millennium Reanalysis Multi-Proxy Database
Author: David M. Anderson, Robert Tardif, Kaleb Horlick, Michael P. Erb, Gregory J. Hakim, David Noone, Walter A. Perkins, Eric Steig
URL: http://doi.org/10.5334/dsj-2019-002
Title: Understanding Human Mobility Patterns in a Developing Country Using Mobile Phone Data
Author: Merkebe Getachew Demissie, Santi Phithakkitnukoon, Lina Kattan, Ali Farhan
URL: http://doi.org/10.5334/dsj-2019-001

Data Science Journal Special Collection for SciDataCon 2016

Data Science Journal is pleased to announce that it will be publishing the high profile special collection of papers from SciDataCon 2016.

Authors with papers accepted for presentation at SciDataCon are also invited to submit their full papers to the Data Science Journal.  Submissions should be made at http://datascience.codata.org/

Please note the following:

  • The deadline for submissions to be part of the SciDataCon 2016 special collection is 30 September.
  • Even though abstracts were peer-reviewed and accepted as part of the conference process, the full paper will be peer-reviewed to ensure quality.
  • Given the number of papers expected we are unable to waive the Article Processing Charge (APC) for all papers, however the Data Science Journal is very competitive and has a progressive waiver policy for those unable to pay the APC: http://datascience.codata.org/about/submissions/ Please contact the Editor-in-Chief before submitting your article if you would like to request a waiver. Editorial decisions are made independently from the ability to pay the APC.

SciDataCon 2016

http://www.scidatacon.org/2016/ and http://www.scidatacon.org/site/themes-scope/:

Advancing the Frontiers of Data in Research

SciDataCon 2016 seeks to advance the frontiers of data in all areas of research. This means addressing a range of fundamental and urgent issues around the ‘Data Revolution’ and the recent data-driven transformation of research and the responses to these issues in the conduct of research.

SciDataCon 2016 is motivated by the conviction that the most significant contemporary research challenges—and in particular those reaching across traditional disciplines—cannot be properly addressed without paying attention to issues relating to data.  These issues include policy frameworks, data quality and interoperability, long-term stewardship of data, and the research skills, technologies, and infrastructures required by increasingly data-intensive research.  They also include frontier challenges for data science: for example, fundamental research questions relating to data integration, analysis of complex systems and models, epistemology and ethics in relation to Big Data, and so on.

The transformative effect of the data revolution needs to be examined from the perspective of all fields of research and its relationship to broader societal developments and to data-driven innovation scrutinised.  Taken together these issues form a multi-faceted challenge which cannot be tackled without expertise drawn from many disciplines and diverse roles in the research enterprise.  Furthermore, the transformations around data in research are essentially international and the response must be genuinely global.  SciDataCon is the international conference for research into these issues.

SciDataCon2016 will take place on 11-13 September 2016 at the Sheraton Denver Downtown Hotel, Denver, Colorado, USA.  It is part of International Data Week, 11-16 September 2016, convened by CODATA, the ICSU World Data System and the Research Data Alliance.

3rd LEARN Workshop, Helsinki, June 2016

This post is a syndicated copy of the one at http://citingbytes.blogspot.co.uk/2016/07/3rd-learn-workshop-helsinki-june-2016.html and was written by Sarah Callaghan, Editor-in-Chief of the Data Science Journal

Open Data in a Big Data World

The 3rd LEARN (Leaders Activating Research Networks) workshop on Research Data Management, “Make research data management policies work” was held in Helsinki on Tuesday 28th June. I was invited wearing my CODATA hat (as Editor-in-Chief for the Data Science Journal) to give the closing keynote about the Science International Accord “Open Data in a Big Data World“.

The problem with doing closing talks is that so much of what I wanted to say had pretty much already been said by someone during the course of the day – sometimes even by me during the breakout sessions! Still, it was a really interesting workshop, with excellent discussion (despite the pall that Brexit cast over the coffee and lunchtime conversation – but that’s a topic for another time).

There were three breakout session possibilities, of which the timings meant that you could go to two of them.

I started with Group 3: Making possible and encouraging the reuse of data: incentives needed. This is my day job – taking data in from researchers, making it understandable and reusable, and figuring out ways to give them credit and rewards for doing so. And my group has been doing this for more than 2 decades, so I’m afraid I might have gone off on a bit of a rant. Regardless, we covered a lot, though mainly the old chestnuts of the promotion and tenure system being fixated on publications as the main academic output, the requirements for standards (especially for metadata – acknowledging just how difficult it would be to come up with a universal metadata standard applicable to all research data), and the fact that repositories can control (to a certain extent) the technology, but culture change still needs to happen. Though there were some positives on the culture change – I noted that journals are now pushing DOIs for data, and this has had an impact on people coming to us to get DOIs.

Next breakout group I went to was Group 1: Research Data services planning, implementation and governance. What surprised me in this session (maybe it shouldn’t have) was just how far advanced the UK is when it comes to research data management policies and the likes, in comparison to other countries. This did mean that me and my other UK colleagues did get quizzed a fair bit about our experiences, which made sense. I had a bit of a different perspective from most of the other attendees – being a discipline-specific repository means that we can pick and choose what data we take in, unlike institutional repositories, who have to be more general. On being asked about what other services we provide, I did manage to name-drop JASMIN, in the context of a UK infrastructure for data analysis and storage.

I think the key driver in the UK for getting research data management policies working was the Research Councils, and their policies, but also their willingness to stump up the cash to fund the work. A big push on institutional repositories was EPSRC’s putting the onus on research institutions to manage EPSRC-funded research data. But the increasing importance of data, and people’s increased interest in it, is coming from a wide range of drivers – funders, policies, journals, repositories, etc.

I understand that the talks and notes from the breakouts will be put up on the workshop website, but they’re not up as of the time of me writing this. You can find the slides from my talk here.

Call for Papers – Data Science Journal

The Data Science Journal is a peer-reviewed, open access, electronic journal dedicated to the advancement of data science and its application in policies, practices and management of Open Data.

We are currently soliciting submissions for papers on a wide range of data science topics, across the whole range of computational, natural and social science, and the humanities. The scope of the journal includes descriptions of data systems, their implementations and their publication, applications, infrastructures, software, legal, reproducibility and transparency issues, the availability and usability of complex datasets, and with a particular focus on the principles, policies and practices for data.

All data is in scope, whether born digital or converted from other sources, and all research disciplines are covered. Data is a cross-domain, cross-discipline topic, with common issues, regardless of the domain it serves. The Data Science Journal publishes a variety of article types (research papers, practice papers, review articles and essays). The Data Science Journal also publishes data articles, describing datasets or data compilations, if the potential for reuse of the data is significant or if considerable efforts were required in compilation. Similarly, the Data Science Journal also publishes descriptions of online simulation, database, and other experiments, partnering with digital repositories on ‘meta articles’ or ‘overlay articles’, which link to and allow visualisation of the data, thereby adding an entirely new dimension to the communication and exchange of data research results and educational materials.

For further information, and to submit a manuscript, please visit http://datascience.codata.org/