An example of making ona data more readable ========================================================

ona.R makes is easy to download and work with datasets on ona. After downloading, ona.R post-processes your dataset to convert the different columns to the correct type, which it derives from the type you specified during the creation of your XLSform. If you haven’t read the basics document, I recommend that you read that first.

In this example, we will go through how to make data downloaded from ona prettier by replacing the slugs in your datasets with the text of the orginial question and answer that enumerators saw on ODK or on webforms.

So lets begin with the public good_eats dataset, and look at (1) the names of that csv, and (2) the values of the “ratings” of various good eats.

require(ona)
# Download the dataset named good_eats in the account of mberg
good_eats <- onaDownload("good_eats", "mberg","mberg")
names(good_eats)

##  [1] "submit_data"       "food_type"         "description"      
##  [4] "amount"            "rating"            "risk_factor"      
##  [7] "food_photo"        "location_name"     "location_photo"   
## [10] "gps"               "X_gps_latitude"    "X_gps_longitude"  
## [13] "X_gps_altitude"    "X_gps_precision"   "imei"             
## [16] "submit_date"       "meta.instanceID"   "X_uuid"           
## [19] "X_submission_time" "X_tags"            "X_notes"          
## [22] "X_version"         "X_duration"        "X_submitted_by"

summary(good_eats$risk_factor)

##   high_risk    low_risk medium_risk        NA's 
##         106         351         496         150

We see the “slugs” that Matt input on the name column of his ona form. But with ona.R’s replaceHeaderNamesWithLabels function, we can easily replace the questions with the actual questions that he asked:

good_eats_readable_questions <- replaceHeaderNamesWithLabels(good_eats)
names(good_eats_readable_questions)

##  [1] "submit_data"       "Type of Eat"       "Description"      
##  [4] "Amount"            "Rating"            "Risk Factor"      
##  [7] "Food Pic"          "Location Name"     "Served At"        
## [10] "Location"          "X_gps_latitude"    "X_gps_longitude"  
## [13] "X_gps_altitude"    "X_gps_precision"   "imei"             
## [16] "submit_date"       "instanceID"        "X_uuid"           
## [19] "X_submission_time" "X_tags"            "X_notes"          
## [22] "X_version"         "X_duration"        "X_submitted_by"

You’ll see that all the questions that actually had a label are replaced. The effect is pretty subtle; mostly things are just being capitalized. With this function, the answers to the question remain unreplaced:

summary(good_eats_readable_questions$`Risk Factor`) # Note: the column name, because it includes a space, is surrounded by backticks (` `)

##   high_risk    low_risk medium_risk        NA's 
##         106         351         496         150

We can also do that, easily, using the replaceAllNamesWithLabels function:

good_eats_readable <- replaceAllNamesWithLabels(good_eats)
summary(good_eats_readable$`Risk Factor`)

## High Risk (Hope it was worth it)                         Low Risk 
##                              106                              351 
##       Medium Risk (Questionable)                             NA's 
##                              496                              150

And of course, even the graph comes out looking slightly better with better default labels:

require(ggplot2)
qplot(data=good_eats_readable, x=submit_date, fill=`Risk Factor`)

For multi-lingual forms, the replaceAllNamesWithLabels function takes a language argument:

waterpoint <- onaDownload("waterpoint", "mberg","mberg")
waterpoint_swa <- replaceAllNamesWithLabels(waterpoint, language="Swahili")
qplot(data=waterpoint_swa,x=`Siku ya kukaguliwa`,y=`Hali ya kisima`)