Introduction to Data Analysis

10. Visualization in space: Maps

Let's stick some data on a map, using a simple plotting method that can easily match country-level data to world map coordinates. We will come to using raw geodata in a minute, but for now we will just concentrate on showing how to map a simple cross-sectional data structure like the ones that we have been using in previous sessions.

# Load packages.
packages <- c("countrycode", "downloader", "foreign", "ggplot2", "maps", "RColorBrewer")
packages <- lapply(packages, FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
        install.packages(x)
        library(x, character.only = TRUE)
    }
})

We will use Quality of Government data, for which the country codes are standardized, and a rather old map of the world where a few countries need to be renamed to be matched to QOG country names. The most ambiguous case is that we will be using USSR borders as approximate frontiers for Russia, thereby losing the opportunity to observe post-Soviet states on the final map.

# Get world map coordinates from the maps package.
w <- map_data("world", zoom = 4)
# Fix a few country names.
w$region[which(w$region == "USA")] = "United States"
w$region[which(w$region == "UK")] = "United Kingdom"
w$region[which(w$region == "USSR")] = "Russia"
w$region[which(w$region == "Zaire")] = "Congo, Democratic Republic"
w$region[which(w$region == "North Korea")] = "Korea, North"
w$region[which(w$region == "South Korea")] = "Korea, South"
# Remove Antarctica.
w <- subset(w, !region %in% c("Antarctica", "Greenland"))
# Download Quality of Government Standard dataset.
zip = "data/qog.cs.zip"
qog = "data/qog.cs.csv"
if (!file.exists(zip)) {
    dta = "data/qog.cs.dta"
    download("http://www.qogdata.pol.gu.se/data/qog_std_cs.dta", dta, mode = "wb")
    write.csv(read.dta(dta, warn.missing.labels = FALSE), qog)
    zip(zip, file = c(dta, qog))
    file.remove(dta, qog)
}
qog = read.csv(unz(zip, qog), stringsAsFactors = FALSE)

Here's a quick function to plot Quality of Government data. The first part extracts a variable based on the argument passed to the function, along with (cleaned up) country names. The second part matches this data to the map, based on the region identifier of the world map data. The ast part plots the data by longitude and latitude, using a 'whitened' ggplot2 theme.

qog.map <- function(x) {
  # Extract QOG variables.
  data <- with(qog, data.frame(
    country = gsub(" (\\(.*)", "", cname),
    variable = qog[, which(names(qog) == x)]))
  # Match QOG data to map.
  w$fill <- with(data, by(variable, country, mean))[w$region]
  set1 = brewer.pal(9, "Set1")
  # Plot over blank theme.
  p = qplot(data = w, x = long, y = lat,
            group = group, fill = fill, geom = "polygon")
  p = p + scale_fill_gradient2("",
                               low = set1[1], mid = set1[6], high = set1[3], 
                               midpoint = mean(w$fill, na.rm = TRUE),
                               na.value = set1[9])
  p = p + theme(axis.text = element_blank(),
                axis.ticks = element_blank(),
                panel.grid = element_blank())
  p = p + labs(y = NULL, x = NULL)
  return(p)  
}

You can pass any QOG variable name to the function, although its matching method will only support continuous variables, as shown below with an additive index of human rights by Cingranelli and Richards. The default color gradient uses ColorBrewer's Set1 scheme and goes from green (low) to yellow (at midpoint) to red (high) by default.

# Plot a simple QOG map.
qog.map("ciri_empinx_new") + 
  labs(title = "CIRI Empowerment Rights Index (0-14) in 2009")

plot of chunk qog-map-example-1-auto

You can bypass the function's colors by overriding its ggplot2 fill scale. The example below uses a color gradient of Crayola colors that were coded for R by Matt Blackwell (check out Harvard's Social Science Statistics blog for other goodies, like this elementary Google Maps function by Andy Eggers). The example variable will show a clear outlier in its worldwide distribution.

# Get Crayola colors from a personal copy.
url = "https://gist.github.com/briatte/5813759/raw/53edb4d59e1c54a2abc6c4c034ef4925625bc8cb/crayola.R"
source_url(url, prompt = FALSE)
# Get a variable's mean.
the_mean = mean(qog$wdi_the, na.rm = TRUE)
# Get a variable's quantiles.
the_quantiles = as.vector(round(quantile(qog$wdi_the, na.rm = TRUE)))
# Plot with Crayola color gradient.
qog.map("wdi_the") + 
  labs(title = "Total health expenditure (% of GDP) in 2009") +
  scale_fill_gradient2("", 
                       low = crayola["Blue Green"], 
                       high = crayola["Sunset Orange"], 
                       mid = crayola["Yellow"], 
                       midpoint = the_mean, 
                       limits = range(the_quantiles),
                       breaks = the_quantiles,
                       na.value = crayola["Gray"])

plot of chunk qog-map-example-2-auto

Next: Geocoding.