De-Novo Assembly of Short Reads in Minimal Overlap Model


Shashank Sharma, IIT Delhi

Next Generation Sequencing (NGS) technologies produce millions of short reads that provide high coverage of genome at much lower cost than Sanger Sequencing based technologies. The advent of NGS technologies has led to various developments in assembling techniques. Our focus is on adapting overlap graph based algorithms to work with millions of NGS reads. Due to the high coverage of the genome by NGS reads, we show that it is feasible to perform assembly while working with small overlaps. This strategy gives us a significant computational and space advantage over the existing approaches. Our method finds alternate paths in an overlap graph to construct an assembly. We compare the performance of our tool, MOBS, with some of the widely used assemblers on ideal datasets (error free reads, distributed uniformly over genome), for which finished genomes are available. We show that MOBS results are most of the time better than other assemblers with respect to quality of assemblies, running time and genome coverage.