Table of Contents

Team 1 report: assembly with Meraculous

Basic features

Published by the Joint Genome Institute, part of the US Department of Energy. Meraculous was initially designed for haploid assembly, but currently supports diploid assembly as well. The advantages of this assembler include multi-threaded and parallelized computation, absence of error-correction for faster processing, paired-end short reads compatibility (e.g., Illumina), efficient and conservative traversal of subgraphs of the de Bruijn graph, selection of kmer set, production of a set of maximal linear sub-paths of the de Bruijn graph, and alignment of reads to assembly in order to identify useful read-pair information and closure of gaps. Meraculous has been used to assemble the Pichia stipitis genome, a 15.4 Mb genome, using 75 bp paired reads with 425x coverage. The resulting assembly covered 95% of the genome and had an N50 of 101 kb.

Meraculous algorithm

Meraculous limitations

User experience

Installation

Running Meraculous

Overall impression

Error correction

KamerGenie

Meraculous requires an optimal kmer size for runs. KmerGenie is a program used to give optimal assembly kmer size by generating abundance histograms for many abundance histograms for many values of k. Here is a link that helped me understand KmerGenie: http://kmergenie.bx.psu.edu/.

Musket

Previous analysis

Future directions