Introduction to Data Analysis

# 11. Networks

Networks are a common aspect of your daily life, and since you have been logging your network of friends and colleagues on services like Facebook or LinkedIn, there are tons of data available. There is also a large amount of research on networks with tough questions, such as the analytical difference between homophily (connecting to those like us) and contagion (becoming like our connections). We will stick to description and simple measures of influence.

There are several software options for network analysis, like Gephi, Pajek or VOSON (for hyperlink networks). We will stay in R and use the sna and igraph libraries, which use different but compatible formats to store network data. Some examples will be taken from Baptiste Coulmont's graphs of small cliques.

## A random network

We'll start with simulating a random network of $$n = 30$$ individuals (ego), for which we simulate a bidirectional friendship relationship: if individual 'Ego' is a friend of individual 'Alter', then the reciprocal is true. Each individual has the possibility to associate with any other individual in the network, resulting in a network matrix of $$30^2 = 900$$ rows, with one extra row per individual that connects it to itself ($$n-n$$) and that will be ignored when generating relationships. The result is the rnet dataset.

# Set network size.
n = 30
# Create n series of n.
ego = rep(1:n, each = n)
# Create n sequences of n.
alter = rep(1:n, times = n)
# Default to no friendship between ego and alter.
friendship = 0
# Assemble dataset.
rnet = data.frame(ego, alter, friendship)
# First rows.

  ego alter friendship
1   1     1          0
2   1     2          0
3   1     3          0
4   1     4          0
5   1     5          0
6   1     6          0


To generate random relationships, we draw from a binomial distribution where the probability of a friendship is artificially set to $$Pr(friendship) = .15$$. The result is a network that displays approximately 15% of all possible friendship ties in the rnet dataset.

# Probability of friendship tie.
conDen <- 0.15
# Assign ties to random nodes.
for (i in 1:n) for (ii in (i + 1):n) if ((rbinom(1, 1, conDen) == 1) & (i !=
ii)) {
rnet$friendship[(rnet$ego == i & rnet$alter == ii)] = 1 rnet$friendship[(rnet$ego == ii & rnet$alter == i)] = 1
}
# Inspect random network ties.
summary(rnet)

      ego           alter        friendship
Min.   : 1.0   Min.   : 1.0   Min.   :0.000
1st Qu.: 8.0   1st Qu.: 8.0   1st Qu.:0.000
Median :15.5   Median :15.5   Median :0.000
Mean   :15.5   Mean   :15.5   Mean   :0.124
3rd Qu.:23.0   3rd Qu.:23.0   3rd Qu.:0.000
Max.   :30.0   Max.   :30.0   Max.   :1.000


The network is drawn with the ggnet function. The plot function processes the subset of the rnet data frame for which the friendship variable indiciates that there is a relationship to draw. The ties are undirected: there are no arrows between the nodes because the friendship ties are strictly reciprocal.

# Form network object.
net = network(rnet[rnet\$friendship == 1, ], directed = FALSE)
net

 Network attributes:
vertices = 30
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 112
missing edges= 0
non-missing edges= 112

Vertex attribute names:
vertex.names

No edge attributes

# Plot random network.
ggnet(net,
label = TRUE,
color = "white")


This function is used in the next pages to plot a few social networks. You can train yourself by plotting fictional networks, like the one below using the Grey's Anatomy network by Gary Weissman, or turn to Solomon Messing's analysis of U.S. student affiliations for a real-world example of network data.

# Locate data.
file = "data/ga.network.csv"
# Create network.
net = network(read.csv(file), directed = FALSE)
# Plot network.
ggnet(net,
label = TRUE,
color = "white",
top8 = TRUE,
size = 18,
legend.position = "none")


The next pages make more use of the ggnet function with Twitter data and word associations plotted as network ties.

Next: Influence.