Introduction to Data Analysis

3. Functions

We saw a bit of calculus, logic and vector/matrix manipulation last week. This week, we thread on the same topics with slightly more advanced operators. Here, for instance, is the modulus operator in R:

4%/%3
[1] 1
6%/%3
[1] 2

Many matrix operators are also available: you might want to go back to the cheat sheet previously mentioned. Here are some basic manipulations on matrixes built out of random integers.

# Create a random 3 x 5 matrix.
A <- matrix(as.integer(10 * runif(30)), nrow = 3, ncol = 5)
# Check result.
A
     [,1] [,2] [,3] [,4] [,5]
[1,]    6    8    7    0    9
[2,]    9    4    4    7    8
[3,]    2    3    4    2    2
# Create a random 2 x 2 (square) matrix.
B <- matrix(as.integer(10 * runif(16)), nrow = 2, ncol = 2)
# Check result.
B
     [,1] [,2]
[1,]    6    6
[2,]    2    4
# Create another one.
C <- matrix(as.integer(10 * runif(16)), nrow = 2, ncol = 2)
# Check result.
C
     [,1] [,2]
[1,]    6    5
[2,]    4    5
# Now a basic manipulation: scalar multiplication.
2 * A
     [,1] [,2] [,3] [,4] [,5]
[1,]   12   16   14    0   18
[2,]   18    8    8   14   16
[3,]    4    6    8    4    4
# Another one: extract the diagonal.
diag(B)
[1] 6 4
# Last one: matrix transposition.
t(C)
     [,1] [,2]
[1,]    6    4
[2,]    5    5

As an exercise, explain the result of a square matrix product.

# Square matrix multiplication.
B %*% C
     [,1] [,2]
[1,]   60   60
[2,]   28   30

Functional programming

Suppose that you find yourself doing the same computation over and over again. You do not want to type the whole calculations every time you need: you want to program a function that does the job for you.

Take a basic example: the sum command adds a vector of numbers together.

# Create a vector of 99 random [0,1] values.
x <- runif(9)
# Check result: show first 5 values.
head(x)
[1] 0.006769 0.937011 0.175692 0.223838 0.063343 0.044156
# Add them up.
sum(x)
[1] 2.744

Re-defining that function under a different name is trivial, except the example below will only accept two elements to show their sum, as defined by the primitive function +. The example is trivial insofar as functions are generally designed to capture more complex operations.

# Define function.
add <- function(x, y) {
    x + y
}
# Example.
add(2, 4)
[1] 6

Statistical computing courses like the one taught by Cosma Shalizi contains tons of interesting examples of such functions. A simple function from that course is shown below to illustrate the principle of recursion in computer code:

# Calculate a factorial Input: a number (n) Output: the factorial of n
# Presumes: n is a single positive integer
my.factorial <- function(n) {
    if (n == 1) {
        # Base case
        return(1)
    } else {
        # Recursion
        return(n * my.factorial(n - 1))
    }
}

Plotting functions

This course might lead you to write simple functions, but for the moment, let's focus simply on plotting them with the ggplot2 package, which makes it fairly easy. The example below shows the basic function \(y = x\).

qplot(c(0, 2), stat = "function", fun = identity, geom = "point")

plot of chunk fun-identity

Here's a function that will be more useful to us: the exponential.

qplot(-10:10, stat = "function", fun = exp, geom = c("line", "point"))

plot of chunk fun-exp

And finally an example of several embedded functions: the plot first generates a random distribution of \(N = 1000\) observations and then draws its empirical cumulative distribution function (ECDF), to which we will come back in due time.

qplot(rnorm(1000), stat = "ecdf", geom = "step")

plot of chunk fun-ecdf

Next: Control flow.