Machine learning to understand diversity in transcriptional regulation

Leelavati Narlikar, NCL Pune

An important question in biology is how non-coding DNA regions contribute to the diversity in regulation of transcription. A major step forward has been the development of high-throughput technologies that can detect various regulatory events like protein-DNA binding, transcription initiation, splicing, and many more, at single nucleotide resolution and at a genome-wide scale. However, the subsequent step of understanding the functional significance of each event is yet far from complete. We believe this is primarily due to the fact that most studies focus on either using previously known biology to analyze these events, which makes it difficult to learn new mechanisms, or perform an average, overrepresentation-based analysis, which precludes discovery of smaller, distinct classes of regulatory mechanisms. I will talk about some of our attempts to reach these higher hanging fruits.