Characterising transcription factor binding sites as Bayesian networks

Rahul Siddharthan, IMSc Chennai

Transcription factor binding sites are widely described using "position weight matrices", where each position in a binding sequence is characterised by an independent probability weight vector. Inadequacies of this description have long been known, and several previous attempts have been made to go beyond this description, but none have achieved much popularity. The task is, given a prior collection of binding sites, to calculate the joint probability of a sequence of nucleotides, which need not be independent. A factorisation of the joint probability can be represented by a Bayesian network. We describe the methodology of BNs in general, and the application here in particular.