Introduction to Data Analysis

11. Networks

Networks are a common aspect of your daily life, and since you have been logging your network of friends and colleagues on services like Facebook or LinkedIn, there are tons of data available. There is also a large amount of research on networks with tough questions, such as the analytical difference between homophily (connecting to those like us) and contagion (becoming like our connections). We will stick to description and simple measures of influence.

There are several software options for network analysis, like Gephi, Pajek or VOSON (for hyperlink networks). We will stay in R and use the sna and igraph libraries, which use different but compatible formats to store network data. Some examples will be taken from Baptiste Coulmont's graphs of small cliques.

A random network

We'll start with simulating a random network of \(n = 30\) individuals (ego), for which we simulate a bidirectional friendship relationship: if individual 'Ego' is a friend of individual 'Alter', then the reciprocal is true. Each individual has the possibility to associate with any other individual in the network, resulting in a network matrix of \(30^2 = 900\) rows, with one extra row per individual that connects it to itself (\(n-n\)) and that will be ignored when generating relationships. The result is the rnet dataset.

# Set network size.
n = 30
# Create n series of n.
ego = rep(1:n, each = n)
# Create n sequences of n.
alter = rep(1:n, times = n)
# Default to no friendship between ego and alter.
friendship = 0
# Assemble dataset.
rnet = data.frame(ego, alter, friendship)
# First rows.
head(rnet)
  ego alter friendship
1   1     1          0
2   1     2          0
3   1     3          0
4   1     4          0
5   1     5          0
6   1     6          0

To generate random relationships, we draw from a binomial distribution where the probability of a friendship is artificially set to \(Pr(friendship) = .15\). The result is a network that displays approximately 15% of all possible friendship ties in the rnet dataset.

# Probability of friendship tie.
conDen <- 0.15
# Assign ties to random nodes.
for (i in 1:n) for (ii in (i + 1):n) if ((rbinom(1, 1, conDen) == 1) & (i != 
    ii)) {
    rnet$friendship[(rnet$ego == i & rnet$alter == ii)] = 1
    rnet$friendship[(rnet$ego == ii & rnet$alter == i)] = 1
}
# Inspect random network ties.
summary(rnet)
      ego           alter        friendship   
 Min.   : 1.0   Min.   : 1.0   Min.   :0.000  
 1st Qu.: 8.0   1st Qu.: 8.0   1st Qu.:0.000  
 Median :15.5   Median :15.5   Median :0.000  
 Mean   :15.5   Mean   :15.5   Mean   :0.124  
 3rd Qu.:23.0   3rd Qu.:23.0   3rd Qu.:0.000  
 Max.   :30.0   Max.   :30.0   Max.   :1.000  

The network is drawn with the ggnet function. The plot function processes the subset of the rnet data frame for which the friendship variable indiciates that there is a relationship to draw. The ties are undirected: there are no arrows between the nodes because the friendship ties are strictly reciprocal.

# Form network object.
net = network(rnet[rnet$friendship == 1, ], directed = FALSE)
net
 Network attributes:
  vertices = 30 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 112 
    missing edges= 0 
    non-missing edges= 112 

 Vertex attribute names: 
    vertex.names 

No edge attributes
# Plot random network.
ggnet(net,
      label = TRUE,
      color = "white")

plot of chunk smart-randomnetwork-3-auto

This function is used in the next pages to plot a few social networks. You can train yourself by plotting fictional networks, like the one below using the Grey's Anatomy network by Gary Weissman, or turn to Solomon Messing's analysis of U.S. student affiliations for a real-world example of network data.

# Locate data.
link = "http://www.babelgraph.org/data/ga_edgelist.csv"
file = "data/ga.network.csv"
# Download data.
if(!file.exists(file)) download(link, file, mode = "wb")
# Create network.
net = network(read.csv(file), directed = FALSE)
# Plot network.
ggnet(net, 
      label = TRUE, 
      color = "white", 
      top8 = TRUE, 
      size = 18,
      legend.position = "none")

plot of chunk greys-anatomy-auto

The next pages make more use of the ggnet function with Twitter data and word associations plotted as network ties.

Next: Influence.