An example of making ona data more readable ========================================================
ona.R makes is easy to download and work with datasets on ona. After downloading, ona.R post-processes your dataset to convert the different columns to the correct type, which it derives from the type
you specified during the creation of your XLSform. If you haven’t read the basics document, I recommend that you read that first.
In this example, we will go through how to make data downloaded from ona prettier by replacing the slugs in your datasets with the text of the orginial question and answer that enumerators saw on ODK or on webforms.
So lets begin with the public good_eats
dataset, and look at (1) the names of that csv, and (2) the values of the “ratings” of various good eats.
require(ona)
# Download the dataset named good_eats in the account of mberg
good_eats <- onaDownload("good_eats", "mberg","mberg")
names(good_eats)
## [1] "submit_data" "food_type" "description"
## [4] "amount" "rating" "risk_factor"
## [7] "food_photo" "location_name" "location_photo"
## [10] "gps" "X_gps_latitude" "X_gps_longitude"
## [13] "X_gps_altitude" "X_gps_precision" "imei"
## [16] "submit_date" "meta.instanceID" "X_uuid"
## [19] "X_submission_time" "X_tags" "X_notes"
## [22] "X_version" "X_duration" "X_submitted_by"
summary(good_eats$risk_factor)
## high_risk low_risk medium_risk NA's
## 106 351 496 150
We see the “slugs” that Matt input on the name
column of his ona form. But with ona.R’s replaceHeaderNamesWithLabels
function, we can easily replace the questions with the actual questions that he asked:
good_eats_readable_questions <- replaceHeaderNamesWithLabels(good_eats)
names(good_eats_readable_questions)
## [1] "submit_data" "Type of Eat" "Description"
## [4] "Amount" "Rating" "Risk Factor"
## [7] "Food Pic" "Location Name" "Served At"
## [10] "Location" "X_gps_latitude" "X_gps_longitude"
## [13] "X_gps_altitude" "X_gps_precision" "imei"
## [16] "submit_date" "instanceID" "X_uuid"
## [19] "X_submission_time" "X_tags" "X_notes"
## [22] "X_version" "X_duration" "X_submitted_by"
You’ll see that all the questions that actually had a label are replaced. The effect is pretty subtle; mostly things are just being capitalized. With this function, the answers to the question remain unreplaced:
summary(good_eats_readable_questions$`Risk Factor`) # Note: the column name, because it includes a space, is surrounded by backticks (` `)
## high_risk low_risk medium_risk NA's
## 106 351 496 150
We can also do that, easily, using the replaceAllNamesWithLabels
function:
good_eats_readable <- replaceAllNamesWithLabels(good_eats)
summary(good_eats_readable$`Risk Factor`)
## High Risk (Hope it was worth it) Low Risk
## 106 351
## Medium Risk (Questionable) NA's
## 496 150
And of course, even the graph comes out looking slightly better with better default labels:
require(ggplot2)
qplot(data=good_eats_readable, x=submit_date, fill=`Risk Factor`)
For multi-lingual forms, the replaceAllNamesWithLabels
function takes a language argument:
waterpoint <- onaDownload("waterpoint", "mberg","mberg")
waterpoint_swa <- replaceAllNamesWithLabels(waterpoint, language="Swahili")
qplot(data=waterpoint_swa,x=`Siku ya kukaguliwa`,y=`Hali ya kisima`)