{"id":373,"date":"2015-03-31T08:29:39","date_gmt":"2015-03-31T08:29:39","guid":{"rendered":"http:\/\/codata.org\/blog\/?p=373"},"modified":"2020-02-27T08:45:20","modified_gmt":"2020-02-27T08:45:20","slug":"isi-codata-big-data-workshop-as-word-clouds","status":"publish","type":"post","link":"https:\/\/codata.org\/blog\/2015\/03\/31\/isi-codata-big-data-workshop-as-word-clouds\/","title":{"rendered":"ISI-CODATA Big Data Workshop as Word Clouds"},"content":{"rendered":"<p class=\"p1\"><em><a href=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/Shiva_Trimmed.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-375\" src=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/Shiva_Trimmed-231x300.png\" alt=\"Shiva_Khanal\" width=\"231\" height=\"300\" srcset=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/Shiva_Trimmed-231x300.png 231w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/Shiva_Trimmed.png 476w\" sizes=\"auto, (max-width: 231px) 100vw, 231px\" \/><\/a>This post was written by <a href=\"https:\/\/www.linkedin.com\/pub\/shiva-khanal\/16\/661\/875\" target=\"_blank\" rel=\"noopener\">Shiva Khanal<\/a>,\u00a0<a href=\"http:\/\/www.forestrynepal.org\/shiva-khanal\" target=\"_blank\" rel=\"noopener\">Research Officer with the\u00a0Department of Forest Research and Survey<\/a> in Nepal. \u00a0Shiva was one of the international scholars sponsored by CODATA to attend the ISI-CODATA International Training Workshop on Big Data.<\/em><\/p>\n<p class=\"p1\">This March two associated events were co-organized by the\u00a0<a href=\"http:\/\/www.codata.org\/\" target=\"_blank\" rel=\"noopener\">Committee on Data for Science and Technology (CODATA)<\/a> and the Indian Statistical Institute (ISI): the International Seminar on Data Science (19-20, Mar 2015)\u00a0 and ISI CODATA International Training Workshop on Big Data (9-18, Mar 2015). Those events in ISI Bangalore, India, covered a wide range of talks and presentations related to big data with presenters from diverse background such as academic community, business sectors and data scientists.<\/p>\n<p class=\"p1\">One way to visualize the focus of the program would be to make plot of terms that were more frequent. I obtained the schedule of presentations and tutorials during seminar (<a href=\"http:\/\/drtc1.isibang.ac.in\/datascience\/schedule.html\"><span class=\"s1\">http:\/\/drtc1.isibang.ac.in\/datascience\/schedule.html)<\/span><\/a> and training workshop (<a href=\"http:\/\/drtc1.isibang.ac.in\/bdworkshop\/schedule.html)\"><span class=\"s1\">http:\/\/drtc1.isibang.ac.in\/bdworkshop\/schedule.html)<\/span><\/a> and generated a word cloud using the R package &#8211; wordcloud.<\/p>\n<p class=\"p1\">The R code along with the dropbox link to the data is provided. Pasting this on R console will give the word cloud shown here:<\/p>\n<p class=\"p1\"><a href=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isi_codata_word_cloud.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-large wp-image-376\" src=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isi_codata_word_cloud-1024x1024.png\" alt=\"isi_codata_word_cloud\" width=\"625\" height=\"625\" srcset=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isi_codata_word_cloud-1024x1024.png 1024w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isi_codata_word_cloud-150x150.png 150w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isi_codata_word_cloud-300x300.png 300w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isi_codata_word_cloud-624x624.png 624w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p class=\"p1\">Here is the code:<\/p>\n<p class=\"p1\">###########################################<\/p>\n<p class=\"p1\">#load packages<\/p>\n<p class=\"p1\">library(wordcloud)<\/p>\n<p class=\"p1\">library(tm)<\/p>\n<p class=\"p1\">#read the presentation details<\/p>\n<p class=\"p2\"><span class=\"s2\">textf = readLines(&#8220;<a href=\"http:\/\/dl.dropboxusercontent.com\/u\/111213395\/text_file_presentation_titles.txt%22)\"><span class=\"s3\">http:\/\/dl.dropboxusercontent.com\/u\/111213395\/text_file_presentation_titles.txt&#8221;)<\/span><\/a><\/span><\/p>\n<p class=\"p1\"># get a column of strings.<\/p>\n<p class=\"p1\">text_corpus &lt;- Corpus(VectorSource(textf))<\/p>\n<p class=\"p1\"># create document term matrix and apply transformations<\/p>\n<p class=\"p1\">tdm = TermDocumentMatrix(text_corpus,<\/p>\n<p class=\"p1\">\u00a0\u00a0\u00a0\u00a0\u00a0 control = list(removePunctuation = TRUE, stripWhitespace=TRUE,<\/p>\n<p class=\"p1\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 stopwords = c(stopwords()),PlainTextDocument =TRUE, removeNumbers = TRUE, tolower = TRUE))<\/p>\n<p class=\"p1\">m &lt;- as.matrix(tdm)<\/p>\n<p class=\"p1\">v &lt;- sort(rowSums(m),decreasing=TRUE)<\/p>\n<p class=\"p1\">d &lt;- data.frame(word = names(v),freq=v)<\/p>\n<p class=\"p1\">pal &lt;- brewer.pal(6,&#8221;Dark2&#8243;)<\/p>\n<p class=\"p1\">pal &lt;- pal[-(1)]<\/p>\n<p class=\"p1\">#plot the word cloud and save as png<\/p>\n<p class=\"p1\">png(&#8220;test.png&#8221;,width=3.25,height=3.25,units=&#8221;in&#8221;,res=1200)<\/p>\n<p class=\"p1\">wordcloud(d$word,d$freq,c(4,.3),2,,TRUE,TRUE,.15,pal)<\/p>\n<p class=\"p1\">dev.off()<\/p>\n<p class=\"p3\">###########################################<\/p>\n<p class=\"p1\">I also\u00a0created a word cloud from twitter using #isibigdata (total ~100 tweets). Unlike, the \u00a0text based word cloud above, twitter extraction required little bit of customization (setting up credentials for a twitteR session), but otherwise the plotting codes are almost the same).<\/p>\n<p class=\"p1\"><a href=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-large wp-image-377\" src=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud-1024x1024.png\" alt=\"isibigdata tweet cloud\" width=\"625\" height=\"625\" srcset=\"https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud-1024x1024.png 1024w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud-150x150.png 150w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud-300x300.png 300w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud-624x624.png 624w, https:\/\/codata.org\/blog\/wp-content\/uploads\/2015\/03\/isibigdata-tweet-cloud.png 1280w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post was written by Shiva Khanal,\u00a0Research Officer with the\u00a0Department of Forest Research and Survey in Nepal. \u00a0Shiva was one of the international scholars sponsored by CODATA to attend the ISI-CODATA International Training Workshop on Big Data. This March two associated events were co-organized by the\u00a0Committee on Data for Science and Technology (CODATA) and the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-373","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/posts\/373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/comments?post=373"}],"version-history":[{"count":5,"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/posts\/373\/revisions"}],"predecessor-version":[{"id":1916,"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/posts\/373\/revisions\/1916"}],"wp:attachment":[{"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/media?parent=373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/categories?post=373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codata.org\/blog\/wp-json\/wp\/v2\/tags?post=373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}