In R, everything is an object. The list of objects in memory ls()
is an object itself, as shown in the example below, which lists all objects in the current workspace and wipes them from memory.
# List workspace objects.
ls()
# Erase entire workspace.
rm(list = ls())
Let's use some real, anonymized data from Autumn 2012. These are the grades from my three mathematics classes in first year. I have removed any student identification, so you have to trust me that these are the real grades (and yes, some grades range above 20/20!). The downloader
package provides a handy command to download the data from the course repository, so we install and load it first.
# Install downloader package.
if (!"downloader" %in% installed.packages()[, 1]) install.packages("downloader")
# Load package.
require(downloader)
# Target file.
file = "data/grades.2012.csv"
# Download the data if needed.
if (!file.exists(file)) {
# Locate the data.
url = "https://raw.github.com/briatte/ida/master/data/grades.2012.csv"
# Download the data.
download(url, file, mode = "wb")
}
# Load the data.
grades <- read.table(file, header = TRUE)
# Check result.
head(grades)
calc proba stats
1 18 7 15
2 15 10 14
3 4 3 8
4 7 9 14
5 20 12 19
6 3 4 4
Let's now use a package to create fake names for the students. We again need to install and load the package first: in later sessions, we will use a standard code block to install-and-load packages.
# Install randomNames package. Remember that R is case-sensitive.
if (!"randomNames" %in% installed.packages()[, 1]) install.packages("randomNames")
# Load package.
require(randomNames)
Loading required package: randomNames
# How many rows of data do we have?
(count = nrow(grades))
[1] 86
# Let's generate that many random names.
names <- randomNames(count)
# Let's finally stick them to the matrix.
grades <- cbind(grades, names)
# Check result.
head(grades)
calc proba stats names
1 18 7 15 Nguyen, Ashley
2 15 10 14 Maestas, Doris
3 4 3 8 Romero, Vanessa
4 7 9 14 Velasquez, Maria
5 20 12 19 Borjas, Quiana
6 3 4 4 Swanson, Austin
Let's show a final type of object: the data frame.
# Convert to data frame.
grades <- as.data.frame(grades)
# Check result.
head(grades)
calc proba stats names
1 18 7 15 Nguyen, Ashley
2 15 10 14 Maestas, Doris
3 4 3 8 Romero, Vanessa
4 7 9 14 Velasquez, Maria
5 20 12 19 Borjas, Quiana
6 3 4 4 Swanson, Austin
# Check structure of a data frame.
str(grades)
'data.frame': 86 obs. of 4 variables:
$ calc : int 18 15 4 7 20 3 6 9 18 13 ...
$ proba: int 7 10 3 9 12 4 8 6 15 14 ...
$ stats: int 15 14 8 14 19 4 7 9 19 18 ...
$ names: Factor w/ 86 levels "Antonio, Gina",..: 53 43 61 75 8 67 40 35 54 59 ...
Data frames are very malleable objects: we can rearrange the variables easily with commands like melt
from the reshape
package.
# Install and load reshape package.
if (!"reshape" %in% installed.packages()[, 1]) install.packages("reshape")
# Load package.
require(reshape)
# Reshape data from 'wide' (lots of columns) to 'long' (lots of rows).
grades <- melt(grades, id.vars = "names")
# Check result to show how each grade is now held on a separate row.
head(grades[order(grades$names), ])
names variable value
30 Antonio, Gina calc 18
116 Antonio, Gina proba 14
202 Antonio, Gina stats 14
60 Aquiningoc, Carlo calc 8
146 Aquiningoc, Carlo proba 10
232 Aquiningoc, Carlo stats 11
Let's finish with a few plots.
# Install and load ggplot2 package.
if(!"ggplot2" %in% installed.packages()[, 1])
install.packages("ggplot2")
# Load package.
require(ggplot2)
# Plot all three exams.
qplot(data = grades, x = value,
group = variable,
geom = "density")
# Add color and transparency.
qplot(data = grades, x = value,
color = variable,
fill = variable,
alpha = I(.3), geom = "density")
Now use the code on this page to:
Download this data extract from the U.S. National Health Interview Survey 2005. Use RCurl
as shown above. Call the data nhis
.
Create an object called bmi
that corresponds to the Body Mass Index from the height
and weight
columns of the nhis
object. Use the U.S. formula since the data use inches and pounds. Bind the bmi
object to the nhis
object.
Plot the results using qplot(data = ..., x = ..., geom = "density")
.
Bonus question: explore how ggplot2
works and produce plots with the x
and y
variables. Guess what they stand for.
Next: Practice.