Bayes By The Bay, 2013

Schedule

Friday, January 4
12:00	Arrival of bus / checkin to hotel
13:00	Lunch
14:00	Introduction/welcome
14:15	Nozer D. Singpurwalla	Introductory lecture: The essence of the Bayesian argument.
15:00	Devinder S. Sivia	An introduction to Bayesian data analysis
15:45	Coffee
16:15	Rajeeva Karandikar	Introductory talk
17:15 - 17:45	Satya Prakash Singh	Comparison of Three Level Unbalanced Cluster Designs

Saturday, January 5
09:00	Devinder S. Sivia	Assigning probabilities
09:45	Erik van Nimwegen	Bayesian network model for predicting protein contacts
10:30	Coffee
10:45	Nozer Singpurwalla	The axioms of subjective probability; (maybe) an application to lattice QCD / particle physics
11:30	Rajesh Rao	Bayesian models of brain function - I
12:15	Somdatta Sinha	Population dynamics of Drosophila - fitting a model to complex data
13:00	Lunch
14:00	Michael Lässig	Information theory in physics and biology - I
14:45	Uri Keich	Estimating the statistical significance of sequence motifs
15:30	Richard Kirubakaran	The application of Multiple Treatment Meta Analysis in evaluating the efficacy and acceptability of pharmacological interventions in people with Schizophrenia and related Psychoses
16:00	Coffee
16:15	Balaji Rajagopalan	Climatology and hydrology - I
17:00	Vinod Gaur	Inverse Problem Theory with examples
17:45	Discussion / close

Sunday, January 6
09:00	Vinod Gaur	Bayes, Kalman and tomorrow's weather
09:45	Rajesh Rao	Bayesian models of brain function - II
10:30	Coffee
10:45	Erik van Nimwegen	Bayesian network model for predicting protein contacts
11:30	Michael Lässig	Information theory in physics and biology - II
12:15	Uri Keich	Confidently estimating the number of DNA replication origins
13:00	Lunch
14:00	Kevin Korb	Bayesian network modeling
14:45	P S Thiagarajan	Modeling and analysis of bio-pathways using dynamic Bayesian networks - I
15:30	Ninan Sajeeth Philip	Imposing Conditional Independence on Bayesian Computations
16:00	Posters + Coffee

Monday January 7
09:00	P S Thiagarajan	Modeling and analysis of bio-pathways using dynamic Bayesian networks - II
09:45	Balaji Rajagopalan	Climatology and hydrology - II
10:30	Coffee
10:45	Anita Mehta	Searching and fixating: when timescales stand out, and when they do not
11:30	Rajeeva Karandikar	TBA
12:15	Kevin Korb	Causal discovery
13:00	Lunch
14:00	Free / outing

Tuesday, January 8
09:00	P S Thiagarajan	Modeling and analysis of bio-pathways using dynamic Bayesian networks - III
09:45	Michael Lässig	An application to a nonequilibrium scenario: the evolution of influenza.
10:30	Coffee
10:45	Erik van Nimwegen	Using generalized linear models for modeling expression and chromatin state data in terms of predicted binding sites.
11:30	Uri Keich	On designing seeds for similarity search in genomic DNA
12:15	Rajeeva Karandikar	Election analysis and forecasting
13:00	Lunch
14:00	Kevin Korb	Discretization methods for classification
14:45	Balaji Rajagopalan	Climatology and hydrology - III
15:30	Jagdish Krishnaswamy	Dynamic response of Indian Monsoon and high-intensity rain events to ENSO and Indian Ocean Dipole
16:00	Coffee
16:30	Vinod Gaur	Earthquake Hazard in India: Scientific approaches to quantification
17:15	Devinder Sivia	A dialogue with the data

Abstracts

Devinder S. Sivia

An introduction to Bayesian data analysis
We outline the basic principles of Bayesian probability theory and illustrate its use with reflectivity measurements. This approach provides a unified rationale for data analysis, which both justifies many of the commonly used procedures and indicates some natural extensions that enhance their potency.

Assigning probabilities
Although the sum and product rules tell us how probabilities are related, they do not tell us how to assign them. We now discuss the principles of "insufficient reason", invari- ance under appropriate transformations and maximum entropy as some basic ideas that enable us to address this important issue.

A dialogue with the data
A modern Bayesian physicist, Steve Gull from Cambridge, described data analysis as being ‘a dialogue with the data’. This talk aims to illustrate this viewpoint with the aid of a simple example: Peelle’s pertinent puzzle.

Uri Keich

Estimating the statistical significance of sequence motifs
The identification of transcription factor binding sites is an important step in understanding the regulation of gene expression. To address this need, many motif-finding tools have been described that can find short sequence motifs given only an input set of sequences. The development of the significance analysis of the motifs reported by those motif finders has lagged considerably behind the extensive development of the finders themselves. Nevertheless, this analysis is often crucial in helping scientists decide whether or not to carry the predicted motifs to the next stage of their analysis. We will discuss the problem of evaluating the statistical significance of sequence motifs in the general context of evaluating the statistical significance of an observed result.

Confidently estimating the number of DNA replication origins
We present a method for estimating and providing a confidence interval for the number of ``functional'' binding sites with a particular application to estimating the number of DNA replication origins in the genome of the yeast Kluyveromyces lactis. The method requires an initial set of verified sites from which a position specific frequency matrix (PSFM) can be constructed. We further assume that we have access to a sparingly used experimental procedure which can verify the functionality of a few but not all computationally predicted sites. The reliability of our method is demonstrated by correctly predicting the known number of Saccharomyces cerevisiae origins.

On designing seeds for similarity search in genomic DNA
Large-scale similarity searches of genomic DNA are of fundamental importance in annotating functional elements in genomes. To perform large comparisons efficiently, BLAST and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or "seed" of matching bases. The choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating optimal seeds is computationally challenging. In this talk I will address some of the computational problems arising in seed design.

Balaji Rajagopalan

Talk 1: Set the framework with main problems in climatology and hydrology. These include, forecasts (from climate models and statistical models) and stochastic simulation. Existing methods and point out potential Bayesian applications
Talk 2: Application of Bayesian methods to the above problems - kind of a quick survey
Talk 3: Results from some recent research of my group and colleagues - especially, Bayesian model combination

Kevin Korb

Bayesian network modeling
What Bayesian networks are; how to model with them; decision and dynamic extensions; knowledge engineering with Bayesian networks; applications and examples.

Causal discovery
Causation and correlation; causal discovery algorithms; Bayesian causal discovery; naive Bayes classifiers; evaluation issues.

Discretization methods for classification
Standard discretization methods; evaluation issues; new metrics for discretization and evaluation.

Contributed talks

Satya Prakash Singh (IIT Bombay, contributed talk)
Comparison of Three Level Unbalanced Cluster Designs
In this article designs for three level unbalanced cluster designs are evaluated and compared using the quantile dispersion graphs approach. A closed form expression of the power function for studying treatment effect is determined and used a design comparison criterion. The power function depends on the two ratios of the variance components(corresponding to each level of nesting), generally unknown at the planning stage. Thus, to compare these designs a prior knowledge of the intracluster correlations is required. Two interval estimation methods, the first one based on Harville and Fenech’s (1985) confidence intervals for variance ratios and the other a Bayesian approach, are proposed for assigning joint confidence intervals to the intracluster correlations. A detailed simulation study comparing the confidence intervals attained by the different techniques is given. The technique of quantile dispersion graphs is used for comparing the three level cluster designs. Ninan Sajeeth Philip (St Thomas College, Kozhencheri)
Imposing Conditional Independence on Bayesian Computations
Abstract: Conditional Independence of the input features makes it possible to estimate the Bayesian probability as the product of the individual probabilities. However, real life examples including the simple XOR gate are conditionally dependent making it necessary to compute the cumulative probability taking the conditional dependence into consideration. An alternative approach is to redefine the input features so that the conditional dependence is encapsulated into the features. In this case, it is possible to impose conditional independence on the input features that are used for the subsequent computations. An implementation of the scheme and its application to large data volumes with high dimensionality and missing values is discussed. The method was recently used to publish a catalogue of about 6 million point sources from the Sloan Digital Sky Survey and is the biggest catalogue of its nature giving probable nature of objects based on astronomical colour parameters as input. The catalogue was also compared with about 30 different surveys that confirmed the accuracy of the predictions to be well over 98% in most cases whereby establishing the reliability of the proposed method.

Anita Mehta (SN Bose National Centre for Basic Sciences, Kolkata)
Searching and fixating: when timescales stand out, and when they do not.
Using eyetracking data, we analysed the visual movements of sample populations subjected to simultaneous visual and aural inputs in a taskless paradigm. We looked for correlations between these two forms of sensory stimuli via the analysis of the probability distributions of saccades and fixations. As our sample populations involved literate as well as illiterate people, we were able to investigate the effect of literacy on cognitive processing. This is particularly manifest in the case of fixations, where it appears that literacy leads to the presence of a characteristic (attentional) time scale in the appropriate probability distribution. On the other hand, scale-invariance is observed in the saccadic distributions, independent of the literacy level of the subjects. We suggest that these are characterised by Levy-like dynamics.

Jagdish Krishnaswamy (ATREE, Bangalore)
Dynamic response of Indian Monsoon and high-intensity rain events to ENSO and Indian Ocean Dipole
The Indian Monsoon (IM) has a major influence on the welfare of over 1.2 Billion people. Two of the major determinants of the magnitude and intensity of the IM are the El Nino-Southern Oscillation (ENSO) (Walker and Bliss, 1937; Shukla and Paolino, 1983) and the recently described Indian Ocean Dipole(IOD) (Saji and Goswami, 1999). The complex and changing influence of ENSO and IOD on IM has generated major interest (Krishnakumar et al 1999; Ashok et al 2001; Ashok et al 2004;Ihara et al 2007;Ummenhofer 2011) but the understanding of the their role in IM is still inadequate (Gadgil et al 2004). Furthermore there is evidence for an increase in the frequency of extreme rain events (ERE) regionally in India (Goswami et al 2006; Pattanaik and Rajeevan, 2009 ) which has been attributed to global warming (Goswami et al 2006). Here we show using 127 years of data on IM, ENSO and IOD in combination with Bayesian dynamic regression models (West and Harrison, 1997; Petris et al 2009) how the relative influence of ENSO and IOD on IM has evolved, with an increasing influence of IOD. Furthermore, we show that number of intense rain events over a 95 year period defined as (≥50, 100, 200, 300 mm day-1) have increased and are well-explained by poisson generalized linear models with ENSO and IOD indices, with ENSO having a consistent dampening influence and IOD a stronger and enhancing one. As IOD has shown an increasing trend over the past several decades ( Ihara et al 2008) there is concern that ERE could play a major role in monsoon dynamics in the future with major implications for human well-being and ecological processes.

Posters

Kathrin Plankensteiner, KAI GmbH, Austria
Application of Bayesian Causal Models and Time Dependent Models for Semiconductor Lifetime Data
In semiconductor industries, end-of-life tests are necessary to verify that products operate reliable. To save resources, stress tests under accelerated conditions in combination with statistical models are used to predict semiconductor lifetime. Due to limited test time, not all devices can be tested until end-of-life, therefore data may be right censored. Previous investigations show that data is a mixture of two log-normal distributed components representing two different physical failure mechanisms. Therefore, commonly applied physical models, e.g. Coffin-Manson, are insufficient to describe the behavior of the data and thus lead to inaccurate results. To model and predict the complex lifetime data, currently Bayesian Mixtures-of-Experts models depend on stress test parameters are applied. This approach leads to reliable models to predict the lifetime of one semiconductor design, but extrapolations to other designs show large inaccuracies since no physical information concerning the device structure is included. In this PhD modeling approaches that include physical properties of the semiconductor devices are investigated and developed. The basic idea is to model the interconnections, dependencies and effects of physical parameters, material properties and geometry with a causal network. Further, a time dependent model and simulations are used to describe device degradation. Based on this, the underlying lifetime distribution for different designs of a given semiconductor technology can be derived and predicted. We expect that with this modeling approach the degradation under constant as well as varying operating conditions can be modeled, resulting in reliable lifetime predictions.

Somya Mani, NCBS, Bangalore
Developing a scalable cell model to study the evolution of the vesicle transport system
Eukaryotic cells consist of compartments, which are membrane bound collections of molecules, the identity of any compartment being its composition. These compartments exchange matter using the process of vesicle transport, which can broadly be broken down into three decision making steps: selection of cargo from the donor compartment, budding of the vesicle and fusion of the vesicle to the correct target compartment. The molecular players in the system form an interaction network, prone to noise, using which they pass on information about their activation states, according to which they perform their functions. Here we present a general, Boolean logical model of the vesicle transport system which captures the overall dynamics of the system.

Nikhil KL,Koustub Vaze, Vijay Kumar Sharma, JNCASR, Bangalore
Evolution of circadian clocks in Drosophila melanogaster populations selected for early and late adult emergence
While circadian clocks in organisms are believed to have evolved as a response to periodic selection pressures imposed by external geophysical environment, there have been very few attempts to study how clocks evolve in response to selection pressures. To study this aspect in detail, we derived populations of Drosophila melanogaster for early and late timing of adult emergence through rigorous laboratory selection and will henceforth be referred to as early and late lines. As a result of selection, apart from direct response involving evolution of characteristic adult emergence waveforms, the early and late lines also evolved divergent circadian clock properties including clock period (τ) and photic phase response curves for both adult emergence and activity-rest rhythms resulting in diverged entrainment properties under light-dark (LD) cycles. Further studies on entrainment properties of the circadian clocks in our lines indicated that they have evolved mechanisms to differentially sense light at different parts of the day when assayed under asymmetric LD cycles. The phenotypes selected for under LD cycles persist robustly under novel environments such as natural conditions and laboratory temperature cycles as well thereby revealing some interesting characteristics of the circadian clocks. Also, preliminary genetic studies suggest that the genetic basis of the early and late phenotypes is mostly autosomal and does not involve significant effect of the allosomes and the maternal factors.

Vikas Yadav, L. Kozubowski, 3, G. Chatterjee, I. Bose, J. Heitman, K. Sanyal JNCASR, Bangalore
Kinetochore assembly is related to the cell cycle stage in basidiomycetous budding yeast Cryptococcus neoformans - a human fungal pathogen
Closed mitosis and clustered centromeres are considered as two salient features of budding yeasts not found in metazoans. Currently, it is not clear how these mechanisms have evolved. Here, we report an analysis of key mitotic events in the basidiomycetous budding yeast Cryptococcus neoformans. Analysis of the dynamics of microtubules, the kinetochore and nuclear envelope by microscopy using fluorescently tagged proteins showed contrasting features in C. neoformans when compared to Saccharomyces cerevisiae. C. neoformans showed unclustered centromeres which gradually cluster with the progression of mitosis. Analysis of budding index for the loading of individual components of the kinetochore indicated that kinetochores assemble in a step-wise manner in C. neoformans. Several lines of evidence suggested that C. neoformans undergoes a semi-open mitosis. The nuclear envelope in C. neoformans is not intact and deformed in the structure. Our data demonstrate that key mitotic events in C. neoformans are similar to that of metazoan cells.