Serrano et al.'s disparity filter algorithm for directed networks

The disparity filter algorithm by Serrano et al. is a network reduction technique to identify the ‘backbone’ of a weighted network. This note explains how to implement the algorithm in full, based on existing implementations and on Serrano et al.'s paper.

Overview

The disparity filter algorithm operates like a statistical significance test for the weighted edges of a network: given a distribution of edge weights, the algorithm singles out those forming the ‘backbone’ of the network, using the equivalent of a statistical significance threshold to identify them.

The algorithm has its own Wikipedia entry, which lists a few alternative techniques to reduce a graph, including the most obvious one: setting an edge weight threshold, and clipping all edges weighted below it. What the algorithm adds to that operation is a null model to accept or reject edges based on the distribution of their weights, as opposed to a static value.

Implementation

The disparity filter algorithm has been implemented in Python as well as in R. Both implementations are coded in quasi-identical fashion (although the Python one runs much quicker), but as far as I can tell, both implementations ~~are~~ were for undirected networks only (see update).

Let's recode the algorithm using as little R code as possible, while adding support for directed networks at the same time. Taking directedness into account means that there are two scenarios to cover:

  1. If the input graph is undirected, a single run of the algorithm on a symmetrized version of the graph will be enough to model all edges.
  2. If the input graph is directed, two runs of the algorithm are necessary to model the ingoing and outgoing edges. In that scenario, the end result is the combination of both runs.

We handle both scenarios through the backbone function below, which takes a network object of class igraph as input, and determines how the disparity filter algorithm should be run on it:

The disparity_filter function itself needs to perform three operations:

  1. Get the degree, in-degree or out-degree distribution of the graph, and get the edge list of the graph. For undirected networks, symmetrize the list so that the algorithm runs twice on each edge.
  2. For every graph node of degree, in-degree or out-degree superior to 1, compute the sum of the edge weights of its ego network.
  3. For every edge featured in that ego network, test its weight against a null model, and store it if it survives that test.

Note: the reason for running the filter twice on each edge of an undirected network is because we need one run to test the edge between nodes i and j using the sum of edge weights in the ego network of node i, and another run to test the same edge using the sum of weights in the ego network of node j.

Since the igraph package makes all network operations mentioned above trivial, all we really need is a nested loop structure to cover steps (2) and (3), along with the code to test the edges, i.e. to generate their alpha-value:

The part of the code that is commented out is how the p-value of the null model should be computed according to the paper. The quicker way to reach that value, which comes from this email, is a very close approximation of it.


Here's the code from this note as a single Gist, which Alessandro Bessi – who wrote the disparityfilter package – and myself have submitted as a possible addition to the igraph package.

Update (December 8, 2015): the code did not make it to igraph and has been published as version 2.1.0 of the disparityfilter package instead. The package can be installed from GitHub and should become available from CRAN in the near future.

Update (April 19, 2016): version 2.2.3 of the package is now available from CRAN.

  • First published on October 2nd, 2015