Introduction to Data Analysis

2. Objects

One of the first things that you want to learn in R is to manipulate objects. In R, an object can be many things: a number, a piece of text, a function, a series of values, a dataset… Let's start with the very fundamentals.

Manipulating scalars

Let's start with numbers, as you certainly expect R to work like a scientific calculator or like Wolfram Alpha. The following examples show some mathematical expressions in R. The results they return are numeric in nature and are sometimes called scalars because they fit on a scale, the real number line \(\mathbb{R}\).

# Addition.
1 + 2
# Brackets.
(1 + 2)/3
# Powers.
7^2
# Infinity.
1 - Inf
# And so on. These objects are...
class(7^2)

These commands have created objects of a particular class: numeric objects. Train yourself by computing, for instance, \(3^3 - 4\) and \(e^2\). If you need help to find out how to code an exponential in R, just look for help with ??exponential. Note that positive infinity is a number, but try dividing infinity by infinity and the class of the result will be NaN (Not a Number).

Evaluating logical statements

From mathematics, you also know about logical statements, which can only be true or not-true (false). You can ask R to evaluate a given expression by submitting it as a logical expression, and R will respond with a Boolean statement, either TRUE or FALSE. The next examples will create objects of another class: logical objects.

# A simple logical test with the natural logarithm.
log(1) > 0
# Guess why this result differs from the one above.
log(1) >= 0
# Equality requires TWO equal signs.
log(1) == 0
# Negation requires the '!' symbol.
log(1) != 1
# A more complex example using scientific notation.
1 + log10(1e+07) == 8 * exp(1)^0
# And so on. These objects are...
class(1 > 1)

Joining text strings

You might also remember manipulating text, or “strings”, from our first session. There are several ways to have R “paste” and concatenate strings together, which is useful as soon as you are manipulating data like names or dates. Here are some examples:

# The typical 'hello world' text.
"Hello R World!"
[1] "Hello R World!"
# Now 'pasted' from its elements.
paste("Hello", "R", "World", "!")
[1] "Hello R World !"
# Now through raw concatenation.
cat("Hello", "R", "World", "!\n")
Hello R World !
# And so on. These objects are...
class("some text")
[1] "character"

Manipulating vectors

Finally, R can handle multidimensional objects that hold more than one value. In fact, even single results are handled as such: in R, the scalar 2 is a one-dimensional numeric vector. The example below defines a simple sequence of integer values \((1, 2, 3)\), computes its sum and product, and then compares them with each other:

# Create a sequence of integers.
1:3
# Compute the sum of the sequence.
sum(1:3)
# Compute the product of the sequence.
prod(1:3)
# Compare the sum and product.
sum(1:3) == prod(1:3)

The 1, 2, 3 sequence used above is called a numeric vector. It shows up in your Workspace, along with an indication that it contains integer values and has a dimension of 3. The rest of this session will show you how to manipulate such objects.

Fixing R syntax

Before we go further into manipulating objects, here's a quick tour of R syntax. You will inevitably make mistakes when writing R commands, which is fine – nothing (bad) happens when you do. To correct a mistake, simply press UpArrow (↑) to go back into your last console input and fix it.

# This will break. Try it.
print(Hello World!)
# Press UpArrow, select text, add double quotes.
# This will now run fine.
print("Hello World")

R syntax can be a bit strange: for example, for some (probably good) reason, the install.packages command takes quotes, whereas the library command does not. Quotes can be single or double and are sometimes optional.

# This needs quotes.
install.packages("ggplot2")
# This does not.
library(ggplot2)
# But it can.
library("ggplot2")
# Here's a less noisy way to load packages.
require(ggplot2)
# Thankfully, it also accepts quotes.
require("ggplot2")
# Oh, and single quotes will work too.
require("ggplot2")

Note that brackets are omnipresent in R syntax: every function (or command) will require them. The bits written inside the brackets and separated by commas are called arguments. The paste0() function, for instance, accepts any number of arguments. It concatenates (joins) elements into a single character vector, without inserting spaces between them like the paste() function does.

# Print some text, the full date and some text again, with no separators.
paste0("Hello World! Today is ", date(), ".")
[1] "Hello World! Today is Thu Jan  9 22:49:40 2014."

The syntax of each command requires a bit of memorization, and lots of practice. Nobody types functional code in R without going through cycles of trial-and-error with it. Programming requires some discipline, some logic and a lot of patience.

A final thing: it is good practice to write code in a certain style. For example, R is insensitive to extra blank space, but it is good practice to use blank space only when it helps to align elements. The Google R Style Guide is a good starting point to learn a few common conventions.

Let's now turn to manipulating simple things in R.

Next: Vectors and matrices.