Assignment 3

Due on 3/5/2017

Dynamics on and of biological networks
 

Thus far in our assignments we had been exploring networks whose structural organization did not change over time. The aim of this assignment is to explore both dynamics of networks, where the structure (i.e., the nodes and the links connecting them) is no longer time-invariant, as well as, dynamics on networks.

For dynamics of networks, we consider the model of network evolution by duplication and divergence proposed by Vazquez et al in the context of protein interaction networks (see, e.g., A. Vazquez, A Flammini, A Maritan and A Vespignani, Complexus, 2003, 1:38-44). The model is described as follows. At each time step, a node i in the network is chosen randomly and is duplicated, i.e., a new node is added having exactly the same set of neighbors as node i. With probability p, a link is inserted between the new node and the node i to indicate that there is a possibility that the two (identical) nodes may interact with each other. Next, to model the divergence between the (originally) identical nodes that result from mutation, consider each of the k neighbors in turn of node i and its duplicate, choose either of the links at random and remove it with probability q.
Continue adding nodes until the network reaches a pre-specified size of N nodes. Thus the set of networks created by the model is parametrized by 3 quantities: N (the final size), p (probability of interaction between a node and its duplicate) and q (probability of divergence between the two originally identical nodes).
You can take the initial network to be just a pair of connected nodes.

A possible variant is where you check at every step whether a node has become completely disconnected and if so, remove such a node.

Once you have constructed the network, investigate its degree distribution as a function of its parameters N, p and q. In particular, consider p=0.1 and q=0.7 (with N ranging from 100 to 1000 to 10000) that was used by Vazquez et al to generate a network that resembles the Yeast protein interaction network in terms of degree distribution. Consider hundreds of realizations for each set of parameters to generate the average degree distribution and comment on the occurrence of power law (or otherwise) by looking at the complementary cumulative probability density function (CCDF) [a nice way of obtaining the CCDF without having to first create a histogram by binning is to sort the data points in descending order and assign ranks 1... N, with rank 1 being given to the node with the highest degree. Then plot the data such that the degree values are on x-axis and the rank divided by N is on the y-axis. This is actually the CCDF for the data].

For dynamics on networks, we shall consider the SIR model of epidemic propagation on networks. Let each node j of the network correspond to a population having a fraction s_j in susceptible state, fraction i_j in infected state and fraction r_j (=1-s_j-i_j) in recovered/removed state. The evolution equations for each node are
d s_j/dt = -beta_j s_j i_j
d i_j/dt = beta_j s_j i_j - gamma_j i_j
dr_j/dt = gamma_j i_j

For each node j, choose a rate of infection (beta_j = probability that a susceptible contracts a disease after contact with a infected individual in unit time) and rate of recovery (gamma_j) from a distribution (you can choose a Gaussian with a specific mean and variance). Thus, each node will have a different basic reproduction number (R0 = beta/gamma) so that in the absence of any coupling between the nodes, the disease will progress at different rates in the different nodes. We now introduce coupling between the nodes by assuming an adjacency matrix A (symmetric) which is taken to be Erdos-Renyi for simplicity. The susceptible population in a node now not only is in contact with infected population of the same node but also with infecteds in neighboring nodes (although with a lower intensity). Let us model this as
d s_j/dt = -beta_j s_j ((1-alpha) i_j + (alpha/k_j) \Sum_q A(j,q) i_q)
d i_j/dt = beta_j s_j ((1-alpha) i_j + (alpha/k_j) \Sum_q A(j,q) i_q) - gamma_j i_j
dr_j/dt = gamma_j i_j

Note that k_j is the degree of the j-th node, the element of the adjacency matrix A(j,q) = 1 if nodes j and q are connected (and 0 otherwise) and the \Sum_q is a summation over all nodes. The parameter alpha measures how strongly the different nodes are coupled. Thus, if alpha = 0, we get a system of disconnected nodes while if the network is fully connected with alpha=N-1/N then we have a mean-field model with complete homogeneous mixing of all N populations.

Choose mean value of beta as 0.8 and the mean value of gamma as 0.4 (with standard deviation for the two quantities set to 0.1 and 0.05 respectively) such that on average the basic reproductive number is well above the value required for an epidemic to occur. Now start the epidemic by making i_j non-zero in a randomly chosen node and look at how the average fraction of infecteds in the total population change with time. Do this several times by choosing different random nodes for seeding the epidemic and measure the average basic reproductive number for the entire network (this will be related to the exponential rate of growth of infecteds in the total population). How will this be related to the distribution of R0 for the individual nodes. In particular, will R0 of the entire network be the average of the individual R0s of the nodes or the maximum value thereof ?
Will this answer change as we increase alpha from 0 to 0.5 ?

You are free to research the web to find out more about the properties of such networks (but not allowed to copy)...