An introduction to Ona.R ========================================================
ona.R makes is easy to download and work with datasets on ona. After downloading, ona.R post-processes your dataset to convert the different columns to the correct type, which it derives from the type
you specified during the creation of your XLSform. It is distributed as an R package called ona
which is not in CRAN yet, and can be installed in the following way:
install.packages("devtools")
library(devtools)
install_github("ona.R", username="onaio")
The install_github
line will need to be re-run every time you need to update the package, which will be frequent for now, as the package is in early testing. After installation, it can be loaded like you load any other R package:
library(ona)
At this point, we should be ready to get started, and use some of the ona functions. Likely the most useful, and the most basic, one is called onaDownload
. Try typing in help(onaDownload)
in your R terminal to see what it does. We’ll use it to download the good_eats
form from mberg’s account in ona, which is a public dataset and doesn’t require a password. (To download data from an account with a password, simply pass it along as the fourth parameter).
good_eats <- onaDownload("good_eats", "mberg","mberg")
Question: what kind of beast did we just download?
str(good_eats)
## 'data.frame': 1103 obs. of 24 variables:
## Formal class 'onaData' [package "ona"] with 5 slots
## ..@ .Data :List of 24
## .. ..$ : Date, format: "2011-12-30" ...
## .. ..$ : Factor w/ 14 levels "Ba_","baked_goods",..: 13 4 14 8 6 5 8 8 8 3 ...
## .. ..$ : chr "Turkish donuts" "Baklava" "Turkish burger " "Chicken doner" ...
## .. ..$ : num 2 2.75 2 9 20 14 12 18 8 2.5 ...
## .. ..$ : Factor w/ 4 levels "bad","delectible",..: 2 3 2 2 2 2 2 3 2 3 ...
## .. ..$ : Factor w/ 3 levels "high_risk","low_risk",..: 2 2 2 2 2 2 2 2 2 2 ...
## .. ..$ : chr "1325233641666.jpg" "1325233872157.jpg" NA "1325249519178.jpg" ...
## .. ..$ : chr "Tahiri Osmanli Lokmaci" "Karakoy Gulluoglu" NA "Durum Bufe" ...
## .. ..$ : chr "1325233694501.jpg" "1325233903109.jpg" NA "1325250480886.jpg" ...
## .. ..$ : chr "41.0182375414297 28.97094827145338 39.0 30.0" "41.02281326428056 28.977699307724833 39.9000244140625 40.0" "41.01452808827162 28.97566007450223 57.4000244140625 30.0" "41.017082473263144 28.969059996306896 39.5 55.0" ...
## .. ..$ : chr "41.0182375414297" "41.02281326428056" "41.01452808827162" "41.017082473263144" ...
## .. ..$ : chr "28.97094827145338" "28.977699307724833" "28.97566007450223" "28.969059996306896" ...
## .. ..$ : chr "39.0" "39.9000244140625" "57.4000244140625" "39.5" ...
## .. ..$ : chr "30.0" "40.0" "30.0" "55.0" ...
## .. ..$ : Factor w/ 350 levels "00:73:E0:28:19:1D",..: 146 146 146 146 146 146 146 146 146 146 ...
## .. ..$ : Date, format: "2011-12-30" ...
## .. ..$ : chr "uuid:69f40c21-96a1-45de-b955-a40f9beef49a" "uuid:7d3c75b0-c0a1-4134-b9f6-3c0d9dfbad67" "uuid:880e61f8-13cd-41ca-ad18-6a12cfe4da79" "uuid:04ac081c-40a2-424d-bf83-ff4ac409fcf1" ...
## .. ..$ : chr "69f40c21-96a1-45de-b955-a40f9beef49a" "7d3c75b0-c0a1-4134-b9f6-3c0d9dfbad67" "880e61f8-13cd-41ca-ad18-6a12cfe4da79" "04ac081c-40a2-424d-bf83-ff4ac409fcf1" ...
## .. ..$ : chr "2011-12-30T08:19:48" "2011-12-30T08:35:45" "2011-12-30T08:44:00" "2011-12-30T08:46:05" ...
## .. ..$ : logi NA NA NA NA NA NA ...
## .. ..$ : logi NA NA NA NA NA NA ...
## .. ..$ : num 2.02e+11 2.02e+11 2.02e+11 2.02e+11 2.02e+11 ...
## .. ..$ : logi NA NA NA NA NA NA ...
## .. ..$ : chr "None" "None" "None" "None" ...
## ..@ form :'data.frame': 13 obs. of 4 variables:
## .. ..$ name : chr "submit_data" "food_type" "description" "amount" ...
## .. ..$ type : chr "date" "select one" "text" "decimal" ...
## .. ..$ options: chr NA "[\n {\n \"name\": \"morning_food\",\n\"label\": \"Morning Food\" \n},\n{\n \"name\": \"lunch\",\n\"label\": \"Lunch Time\" \n},"| __truncated__ NA NA ...
## .. ..$ label : chr "submit_data" "Type of Eat" "Description" "Amount" ...
## ..@ names : chr "submit_data" "food_type" "description" "amount" ...
## ..@ row.names: int 1 2 3 4 5 6 7 8 9 10 ...
## ..@ .S3Class : chr "data.frame"
R tells us something like 'data.frame': 1103 obs. of 24 variables:
as well as Formal class 'onaData' [package "ona"] with 5 slots
. What this means is that onaData objects can be dealt with data.frames (which makes them very convenient!) and well as “objects” with more properties (such as form
, which is derived from your XLSform). The form
gives ona.R information about the exact question that was asked, and the type of the question asked (was it text
or select one
? or was it a date
?), which lets the library change the types of the values to make them right, which is basically the power of this package.
For simplicity, if you want just a data frame and not this complicated onaData object, you can always use the data.frame
method.
good_eats_pure_data_frame <- data.frame(good_eats)
So the part where R downloaded your data for you was pretty cool. But there is more to the onaDownload
function than just downloading. In the background, the types of each of the columns is converted according to how the data was collected.
# lets inspect the types of the first 10 columns of our downloaded data
str(data.frame(good_eats)[1:10])
## 'data.frame': 1103 obs. of 10 variables:
## $ submit_data : Date, format: "2011-12-30" "2011-12-30" ...
## $ food_type : Factor w/ 14 levels "Ba_","baked_goods",..: 13 4 14 8 6 5 8 8 8 3 ...
## $ description : Factor w/ 456 levels "!dfuf",";mlhkhiuguiiug",..: 415 30 413 66 357 391 268 377 74 414 ...
## $ amount : num 2 2.75 2 9 20 14 12 18 8 2.5 ...
## $ rating : Factor w/ 4 levels "bad","delectible",..: 2 3 2 2 2 2 2 3 2 3 ...
## $ risk_factor : Factor w/ 3 levels "high_risk","low_risk",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ food_photo : Factor w/ 281 levels "0","1","1325233641666.jpg",..: 3 4 NA 6 5 8 7 9 10 11 ...
## $ location_name : Factor w/ 367 levels " vasai east",..: 308 183 NA 86 199 177 97 337 324 NA ...
## $ location_photo: Factor w/ 198 levels "0","1325233694501.jpg",..: 2 3 NA 5 4 NA 6 7 8 9 ...
## $ gps : Factor w/ 324 levels "-0.001245 -0.001802 -0.2 0.4",..: 251 252 249 250 NA 219 220 218 215 214 ...
Similiarly, things like select one
, imei
, and others are converted to factors, integers
and decimals
to numbers. Lets see how this compares with if we had simply just read the file as a csv without any type conversions:
good_eats2 <- read.csv("~/Downloads/good_eats_2013_05_05.csv")
# lets inspect the types of the first 10 columns of our downloaded data
str(good_eats2[1:10])
## 'data.frame': 1103 obs. of 10 variables:
## $ submit_data : Factor w/ 364 levels "1970-02-04","1980-01-09",..: 4 4 4 4 4 4 6 5 7 8 ...
## $ food_type : Factor w/ 15 levels "Ba_","baked_goods",..: 14 4 15 8 14 6 5 8 8 8 ...
## $ description : Factor w/ 459 levels "\nTt\n","!dfuf",..: 418 31 416 67 147 360 394 270 380 75 ...
## $ amount : Factor w/ 215 levels "-0.627","-1",..: 59 63 59 209 59 66 45 28 54 201 ...
## $ rating : Factor w/ 5 levels "bad","delectible",..: 2 3 2 2 2 2 2 2 3 2 ...
## $ comments : Factor w/ 342 levels "!","!!!!!","0",..: 238 270 193 301 121 265 236 193 193 193 ...
## $ risk_factor : Factor w/ 4 levels "high_risk","low_risk",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ food_photo : Factor w/ 282 levels "0","1","1325233641666.jpg",..: 3 5 256 7 4 6 9 8 10 11 ...
## $ location_name : Factor w/ 368 levels " vasai east",..: 308 183 237 86 310 199 177 97 338 325 ...
## $ location_photo: Factor w/ 199 levels "0","1325233694501.jpg",..: 2 4 166 6 3 5 166 7 8 9 ...
Everything is a factor! Why is that bad? Well, see the plots below for yourself:
# install.packages("ggplot2") if you don't have ggplot2 installed yet
library(ggplot2)
qplot(data=good_eats2, x=amount) # from data read in without ona.R
qplot(data=good_eats, x=amount) # from data read in using ona.R
Okay, hopefully by now, you are sold on the usefulness of ona.R, and see some value in it. Since this is a “basics of” document, I’ll end by describing a couple of other high-level functions in ona.R (lower-level functions will be documented over time).
onaDownload
– download data directly from ona by passing form name, account/username/org_username, username, and password for private dataonaRead
– create a onaData object from pre-downloaded files. The first file argument is the csv file, the second is the form.json file (which you can download from the form settings page on ona). Note: unexpected things will happen if the files aren’t the right ones. See the full documentation by using help(onaRead)
.replaceHeaderNamesWithLabels
– get a version of the data where the header row is re-written as the actual question asked.And thats really the gist of it!
This is software that has been tested by only a couple of use cases so far, and writing good code in R is pretty tricky, so there are probably bugs! If you encounter one, please go to your project page, and under “Sharing”, give the username “onasupport” “View and Download” privileges, and file an issue on github