Wednesday, May 10 2023
15:30 - 16:30

Alladi Ramakrishnan Hall

Graph-theoretic models for de novo genome assembly using third-generation sequencing technologies

Chirag Jain

Computational and Data Sciences, IISc Bangalore

The latest breakthroughs in DNA sequencing technologies have made it possible to compute high-quality human genome assemblies at scale, thus providing a more complete picture of human genetic diversity. To move towards a fully automated and robust computational pipeline for deployment in healthcare, it becomes important to develop practically efficient genome assembly algorithms that are also provably-good. Graph-theoretic models play a central role in computing genome assembly. Graph sparsification is commonly used during genome assembly to simplify the graph by removing redundant or spurious edges. However, a graph model must be 'coverage-preserving', i.e., it must ensure that the target genome can be spelled as a walk in the graph, given sufficient sequencing depth. Our work highlights that the commonly used string graph model violates this property, both in theory and practice. We next introduce a novel sparse read-overlap-based graph model that is motivated by theory. Finally, we demonstrate the empirical advantage of this model using human sequencing data.

Meeting ID: 973 6616 3632
Passcode: 250038
This talk will be based on the following publication:

Download as iCalendar