Introduction to Data Analysis

1.3. Practice

Now that you are fully set up to work in RStudio, you get to download your first exercise and run the entire code while reading the comments along the way. This exercise will teach you a few more things about keyboard shortcuts, R syntax, and trivia like this:

rep("See you next week!", 6)
[1] "See you next week!" "See you next week!" "See you next week!"
[4] "See you next week!" "See you next week!" "See you next week!"

Instructions: this week’s exercise is called 1_hello.R. Download or copy-paste that file into a new R script, then open it and run it with the working directory set as the IDA folder. If you download the script, make sure that your browser preserved its .R file extension.

Be careful if you decide to download the script directly from your browser, which is what most users do by simply clicking links instead of right-clicking the link to save the source. Your browser might try to save R scripts as HTML (.html) or plain text (.txt): in that case, make sure to rename the file properly with a .R file extension.

Once you are done with the exercise, please turn to the final setup instructions below.

Folder architecture

The exercises for this course require that you keep things tidy inside your working directory. Start by making sure that you understand, from reading the previous pages, what the working directory is, and how to set it in RStudio. We recommended that you simply call your working directory IDA. You will then need to create two folders inside it:

All scripts in this class assume that you have this folder architecture in place inside your working directory. You can ask R to create the folders for you. First, check that the working directory is the IDA folder by typing getwd(). If not, change it with the 'Session' menu of RStudio or the 'Misc' menu in R. Then run the following lines:

# Check the working directory.
if(!grepl("IDA$", getwd()))
  warning("Not sure whether the working directory is really ",
          "the IDA folder...", "\nCarrying on anyway...")
# Create folders if necessary.
if(!file.exists("code")) dir.create("code")
if(!file.exists("data")) dir.create("data")

Security notice

The code above will warn you if the working directory does not end with “IDA” and then create two folders. Notice that R can easily create, copy or move files and folders: it also has the same ability to destroy them, by removal or overwrite, just as if you were doing it by hand. If you are running Linux, you know that and you can prevent running into issues.

The code block below illustrates what is meant here: it will move any .R script that might be lying around in your main IDA folder to the code subfolder, as to insist on keeping your files tidy. You can run this code safely because the scripts object, which is a list of files ending in .R matched by a regular expression, is identical at the copying and removal stages:

# Match filenames ending in .R in the working directory.
regex <- list.files(".", ".R$")
# This variable will be 0 (FALSE) if nothing is matched.
clean <- length(regex)
# Move .R scripts to code/ subfolder.
if(clean) {
  message("Moving files to code folder:\n", paste(scripts, collapse = "\n"))
  file.copy(scripts, "code")
  file.remove(scripts)
} else {
  message("No R script was found lying around.")
}
No R script was found lying around.

If you ever plan to run R code without full understanding of what the code might accomplish, first consider whether you really want to do that, and give a second look at the code and its source. Consider using a “sandboxed” environment like the one at Rapporter.net to check the code without any possible effect on your system.

Additional help

Last, let's be entirely honest here: even trained people can find R difficult, if not infernal, to learn. R is powerful and flexible, but it has a steep learning curve. If you find yourself at a loss with R, remember that you are not alone (and somebody might hear you scream). To learn R without bleeding through your ears, this course suggests that you follow these three steps:

  1. Read the course pages from this week onwards. Every session is made of four pages that end on a practical replication assignment. We have done our best to offer simple code and lots of examples.
  2. Turn to your textbooks. Chapters from Teetor and Kabacoff have been assigned weekly: check the course index for the reading list, and make sure to try out a few more examples.
  3. Turn to online tutorials, like the two-minute screencasts by Anthony Damico, which are both fun and instructive. His 'twotorials' use R and you can easily run them in RStudio.

The course itself is documented in more detail on its wiki. You might also turn to the README files that were written for the course and its datasets if you need additional details on the contents and examples of each session.

Next week: Objects.