An interactive workshop for learning the basics of Spark in R and how to easily include lazy evaluation in a data analysis workflow. This session intends to cover the basic concepts about Spark, Map Reduce, lazy evaluation and distributed processing and how this can be implemented in data science projects using local resources, such as a personal computer. This workshop will include live exercises with the participants. The focus of this workshop will be sharing with the participants a series of easily applicable tips to include distributed processing in their work with the resources available.
Big Data comes in handy when the dimension of what needs to be done in data preparation or analysis overcomes the capacity of a regular computer with a sequential workflow. Questions like How has Big Data and the use of Spark helped improve the general dynamics of data science in our institution will be explored through this workshop. Data Science projects are using larger data sources over time, and Big Data tools, such as Spark are developing efficient connections with the most popular programming languages used in the field, such as R or Python.
The integration of these tools can be natural and easy for users. SparklyR makes the implementation of distributed processing and lazy evaluation very handy for users to optimize the available computational resources while still following a very natural and simple workflow for data analysis; from the storage, exploratory data analysis, modeling, etc. This workshop aims to share the basics of naturally including Spark in R to optimize the use of available resources.
Session objective: Share practical examples of how to implement Spark with R to a data science project and present how it can actually make a large process more simple and efficient.
Date and time:
- Session 1: 1 hour, 18 June 2022; 6:00 pm IST 6:30 am Costa Rica (GMT-6)
- Session 2: 1 hour, 25 June 2022; 6:00 pm IST 6:30 am Costa Rica (GMT-6)
- Session 3: 1 hour, 02 July 2022; 6:00 pm IST 6:30 am Costa Rica (GMT-6)
Intended Audience: This session will focus on ECRs in the CODATA Connect and community pipeline. The audience will be drawn from different data and research ecological flows. Experience with R or basic programming is preferable.
Pre- requisite: Computer system with R installed.
Topic Organizer: CODATA Connect (Mariana, Shaily, Felix)
Number of Participants: This would be an online workshop.
We will have 10 to 20 participants.
For registration please fill the google form https://forms.gle/
Or send your interest statement via email to firstname.lastname@example.org with the Email subject: Application for Workshop on Introduction to Spark with R and Lazy evaluation
The application should contain Name, Date of birth, Country, city, highest education, area of interest, and why would you like to participate in the workshop (200 words) and would you be attending all the 3 sessions?
Send your applications by 5th June 2022