Table of Contents

Team 1: Meraculous

Team composition

Name Email
Charles Cole chkcole@ucsc.edu
Jake Houser jdhouser@ucsc.edu
Kyle McGovern kmcgover@ucsc.edu
Jennie Richardson jemricha@ucsc.edu

Meraculous overview

Meraculous is a de novo assembler first published by the US Department of Energy Joint Genome Institute and managed by the Lawrence Berkeley National Laboratory. Meraculous was designed for deep paired-end short reads (e.g., Illumina). Stated advantages include:

After selecting a k-mer set, Meraculous produces a set of maximal linear sub-paths of the deBruijn graph. This process avoids an explicit error correction step used in other assemblers, instead relying on base quality scores. It then aligns reads to the assembly in order to identify useful read-pair information. Next, it uses paired-reads and splinting singletons to produce a scaffolding by “ordering and orienting” a set of contigs. Finally, gaps are closed using paired-end placements.

Other programs used

GCC

GCC is a compiler for the GNU operating system (/campusdata/BME235/bin/gcc-4.9.2). Webpage: https://gcc.gnu.org/.

KmerGenie

KmerGenie estimates the best k-mer length for genome de novo assembly (/campusdata/BME235/bin/kmergenie-1.6972). Webpage: http://kmergenie.bx.psu.edu/.

Musket

Musket is a multistage k-mer spectrum based error corrector for Illumina short read data (/campusdata/BME235/bin/musket). Webpage: http://musket.sourceforge.net/.

Skewer

Skewer is an adapter trimmer for Illumina paired-end sequences (/campusdata/BME235/bin/skewer-0.1.123-linux-x86_64). Webpage: http://sourceforge.net/projects/skewer/.

Results

We were unable to get Meraculous to complete the bubble popping step. Prior to the assembly being killed, there were 28,610,138 total contigs with 542,137 (1.89%) contigs over 1,000bp and 15 (5.2e-5%) contigs over 10,000bp.

Our team was dissolved to support other tasks.

Lecture slides

First report, Monday April 20th, 2015

Second report, Monday May 11th, 2015