Name | |
Charles Cole | chkcole@ucsc.edu |
Jake Houser | jdhouser@ucsc.edu |
Kyle McGovern | kmcgover@ucsc.edu |
Jennie Richardson | jemricha@ucsc.edu |
Meraculous is a de novo assembler first published by the US Department of Energy Joint Genome Institute and managed by the Lawrence Berkeley National Laboratory. Meraculous was designed for deep paired-end short reads (e.g., Illumina). Stated advantages include:
After selecting a k-mer set, Meraculous produces a set of maximal linear sub-paths of the deBruijn graph. This process avoids an explicit error correction step used in other assemblers, instead relying on base quality scores. It then aligns reads to the assembly in order to identify useful read-pair information. Next, it uses paired-reads and splinting singletons to produce a scaffolding by “ordering and orienting” a set of contigs. Finally, gaps are closed using paired-end placements.
GCC is a compiler for the GNU operating system (/campusdata/BME235/bin/gcc-4.9.2). Webpage: https://gcc.gnu.org/.
KmerGenie estimates the best k-mer length for genome de novo assembly (/campusdata/BME235/bin/kmergenie-1.6972). Webpage: http://kmergenie.bx.psu.edu/.
Musket is a multistage k-mer spectrum based error corrector for Illumina short read data (/campusdata/BME235/bin/musket). Webpage: http://musket.sourceforge.net/.
Skewer is an adapter trimmer for Illumina paired-end sequences (/campusdata/BME235/bin/skewer-0.1.123-linux-x86_64). Webpage: http://sourceforge.net/projects/skewer/.
We were unable to get Meraculous to complete the bubble popping step. Prior to the assembly being killed, there were 28,610,138 total contigs with 542,137 (1.89%) contigs over 1,000bp and 15 (5.2e-5%) contigs over 10,000bp.
Our team was dissolved to support other tasks.