Assignment 2

Second Part Due on 24/4/17

Kuaffman automata

The aim of this assignment is to show the value of simple models in exploring aspects of complex problems in biology and also provide an example of how connection properties of a network connecting various nodes that constitute a system can influence the collective dynamics of the system.

As a concrete example, let us consider the transcription regulation network in cells that control the expression of genes resulting in a specific phenotype. The discovery that genes can switch each other on or off (by a gene expressing a protein that bind to the regulatory site of another gene, promoting or repressing its expression) and are thus interacting with each other via a network of mutual dependence, can be traced to the work of Francois Jacob and Jacques Monod on the lac operon in the bacterium Escherichia coli (E. coli) in the 1960s.

One of the simplest models of this system was proposed in 1969 by Stuart Kauffman. In this setting, we simplify the bipartite network of gene-protein associations (genes connect to proteins that they express and proteins connect to genes that they regulate) and reduce it to a simple network of genes - with one gene having a directed connection to another gene if the former expresses a protein that regulates the latter. Each gene is assumed to be in any one of two states, expressing (ON=1) or non-expressing (OFF=0) at any given time - hence the state of a gene is a Boolean variable (i.e., 0 or 1). As each of the gene-regulatory interactions for a species may not be known, and moreover, what exactly is the role of a regulatory protein on the expression of a gene may often be unclear, we can further simplify by looking (instead of the transcription network of a particular species) at a generic instance of such a network referred to as the NK Random Boolean network model (or Kauffman Automata).

Here we consider that there are N nodes (genes) each of which receives incoming connections from K nodes in the network (i.e., the gene is controlled in principle by K proteins expressed by the corresponding K genes). Once N and K is specified, we need to therefore construct a directed network of N nodes where the links are chosen to be placed between randomly chosen pairs of nodes such that the in-degree of each node is K. This defines the (directed) adjacency matrix of the system.

Once the adjacency matrix is constructed, we assign a ``logic function'' to each node which specifies what the output of the node will be for each of the 2^K possible input strings of 0s and 1s. Thus, out of the 2^(2^K) possible logic functions we randomly pick one and assign to a node, and do this in turn for all the N nodes. Finally, we pick an initial state for the network (i.e., an assignment of 0 or 1 for each of the N nodes) and start the time-evolution of the system. The system evolves in discrete time-steps by looking at the current state of the nodes and using the logic functions associated with each node, to determine the next state of the nodes.

Eventually, the system may settle down to an attractor that could be time-invariant ("fixed-point") or having a simple period (e.g., switches back and forth between two network states) when K is small. When K is large (the extreme case being K=N-1, such that every node regulates every other) we can get attractors with periods so long that for all practical purposes they appear to be aperiodic.

The first part of the assignment was done a few weeks back. In the second part:

First, we would like to determine what is average value of in-degree is for a real transcription regulation network. For this choose your own favorite database for transcription network information - e.g., you can look at RegNetwork: Regulatory Network Repository for human and mouse data (http://www.regnetworkweb.org/source.jsp). Other possibilities are TRRUST (a manually curated database of human transcriptional regulatory network, http://www.grnpedia.org/trrust/) or RegulonDB (database on transcriptional regulation in Escherichia coli, http://regulondb.ccg.unam.mx/). From the network information, obtain the number of incoming connections for each node and thus estimate the average value K. This would tell us what will be a reasonable value for K we should take if we are using a NK Boolean Network model to reproduce features of the transcription regulation network.

Next, we ask whether the behavior we see in a real transcription network is only a result of the values of N and K or could there be other factors at work here.
To see how network connection structure may play a role in the dynamics, we shall look into how introducing modular organization. We shall divide a network of N=20 nodes into M=4 modules or parts, each comprising n=5 nodes, with connections being more likely within nodes belonging to the same part. The parameter that we shall vary for this is the ratio r of the density of connections between modules (rho_o) to the density of connections within modules (rho_i). To see how rho_o and rho_i are related to N,K and n (number of nodes in a module = M/N), note that the average degree K of a node is equal to (n-1)*rho_i + (N-n)*rho_o, and also that rho_o = r*rho_i. Now by varying r between 0 and 1 one can obtain different levels of modularity. Try r =0, 0.1, 0.5 and 1 and produce the corresponding adjacency matrices.

Once the networks are generated, observe the dynamics of NK Boolean network for r=0, 0.1 and 1 and identify the principal differences for the dynamics in terms of (i) number of distinct attractors for a given K (say K=8) observed at different values of r and (ii) the periods of the attractors for different values of r.
See also if the basins of the attractors show qualitative difference as r is changed from 0 to 0.1 to 1.

You are free to research the web to find out more about the properties of this automata (but not allowed to copy)...