Networks are a common aspect of your daily life, and since you have been logging your network of friends and colleagues on services like Facebook or LinkedIn, there are tons of data available. There is also a large amount of research on networks with tough questions, such as the analytical difference between homophily (connecting to those like us) and contagion (becoming like our connections). We will stick to description and simple measures of influence.
There are several software options for network analysis, like Gephi, Pajek or VOSON (for hyperlink networks). We will stay in R and use the sna
and igraph
libraries, which use different but compatible formats to store network data. Some examples will be taken from Baptiste Coulmont's graphs of small cliques.
We'll start with simulating a random network of \(n = 30\) individuals (ego
), for which we simulate a bidirectional friendship relationship: if individual 'Ego' is a friend of individual 'Alter', then the reciprocal is true. Each individual has the possibility to associate with any other individual in the network, resulting in a network matrix of \(30^2 = 900\) rows, with one extra row per individual that connects it to itself (\(n-n\)) and that will be ignored when generating relationships. The result is the rnet
dataset.
# Set network size.
n = 30
# Create n series of n.
ego = rep(1:n, each = n)
# Create n sequences of n.
alter = rep(1:n, times = n)
# Default to no friendship between ego and alter.
friendship = 0
# Assemble dataset.
rnet = data.frame(ego, alter, friendship)
# First rows.
head(rnet)
ego alter friendship
1 1 1 0
2 1 2 0
3 1 3 0
4 1 4 0
5 1 5 0
6 1 6 0
To generate random relationships, we draw from a binomial distribution where the probability of a friendship is artificially set to \(Pr(friendship) = .15\). The result is a network that displays approximately 15% of all possible friendship
ties in the rnet
dataset.
# Probability of friendship tie.
conDen <- 0.15
# Assign ties to random nodes.
for (i in 1:n) for (ii in (i + 1):n) if ((rbinom(1, 1, conDen) == 1) & (i !=
ii)) {
rnet$friendship[(rnet$ego == i & rnet$alter == ii)] = 1
rnet$friendship[(rnet$ego == ii & rnet$alter == i)] = 1
}
# Inspect random network ties.
summary(rnet)
ego alter friendship
Min. : 1.0 Min. : 1.0 Min. :0.000
1st Qu.: 8.0 1st Qu.: 8.0 1st Qu.:0.000
Median :15.5 Median :15.5 Median :0.000
Mean :15.5 Mean :15.5 Mean :0.124
3rd Qu.:23.0 3rd Qu.:23.0 3rd Qu.:0.000
Max. :30.0 Max. :30.0 Max. :1.000
The network is drawn with the ggnet
function. The plot function processes the subset of the rnet
data frame for which the friendship
variable indiciates that there is a relationship to draw. The ties are undirected: there are no arrows between the nodes because the friendship ties are strictly reciprocal.
# Form network object.
net = network(rnet[rnet$friendship == 1, ], directed = FALSE)
net
Network attributes:
vertices = 30
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 112
missing edges= 0
non-missing edges= 112
Vertex attribute names:
vertex.names
No edge attributes
# Plot random network.
ggnet(net,
label = TRUE,
color = "white")
This function is used in the next pages to plot a few social networks. You can train yourself by plotting fictional networks, like the one below using the Grey's Anatomy network by Gary Weissman, or turn to Solomon Messing's analysis of U.S. student affiliations for a real-world example of network data.
# Locate data.
link = "http://www.babelgraph.org/data/ga_edgelist.csv"
file = "data/ga.network.csv"
# Download data.
if(!file.exists(file)) download(link, file, mode = "wb")
# Create network.
net = network(read.csv(file), directed = FALSE)
# Plot network.
ggnet(net,
label = TRUE,
color = "white",
top8 = TRUE,
size = 18,
legend.position = "none")
The next pages make more use of the ggnet
function with Twitter data and word associations plotted as network ties.
Next: Influence.