User Tools

Site Tools


lecture_notes:04-20-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-20-2015 [2015/04/24 19:58]
calef [Meraculous algorithm]
lecture_notes:04-20-2015 [2015/04/24 20:07] (current)
calef [Error correction]
Line 16: Line 16:
   * Searches unaligned reads as potential gap-closers using mate-pair data   * Searches unaligned reads as potential gap-closers using mate-pair data
 =====Meraculous limitations===== =====Meraculous limitations=====
-  * The assembler relies on data with high quality in order to avoid error correction+  * The assembler relies on data with high quality in order to avoid error correction, also requires high coverage
   * Initial release did not support polyploid genome assembly due to allowing for linear subgraphs of the de Bruijn graph only   * Initial release did not support polyploid genome assembly due to allowing for linear subgraphs of the de Bruijn graph only
-  * Low memory footprint+  * High disk space usage
 =====User experience===== =====User experience=====
   * Requires an array of other scripts in other languages   * Requires an array of other scripts in other languages
   * Most of high level scripts are written in perl   * Most of high level scripts are written in perl
-  * Tested the program ​in small dataset ​and obtained contigs+  * Tested the program ​with the packaged test data and obtained contigs
 =====Installation===== =====Installation=====
-  * Main issue was get all dependencies together  +  * Main issue was new version of GCC and getting ​all the dependencies together ​~16 hrs  
-  * There was one non-standard perl mode needed+  * There was one non-standard perl module ​needed 
 +  * Files with carriage returns
   * Some scripts contain error but they aren't hard to fix.   * Some scripts contain error but they aren't hard to fix.
 =====Running Meraculous===== =====Running Meraculous=====
-  * Execute run_meraculous.sh scripts along with the configuration file+  * Execute run_meraculous.sh scripts along with user-provided ​configuration file
   * Configuration file contains info on where where data is and what format it comes in   * Configuration file contains info on where where data is and what format it comes in
-  * It creates ​a timestamped folder that includes directories containing results of each step and executables to modify ​the run +  * Creates ​a timestamped folder that includes directories containing results of each step and executables to suspend, resume, or restart ​the run from that step 
-  * Then you can check the errors that made a run fail and resume the run +  * Thorough error-logging at each step, allowing ​you to check the errors that made a run fail and then resume the run after fixing the errors 
-  * Logs are informative+  * SGE-aware, handles qsub and monitoring jobs
 =====Overall impression===== =====Overall impression=====
   * Straightforward to figure out what went wrong just requiring a basic understanding of Perl   * Straightforward to figure out what went wrong just requiring a basic understanding of Perl
Line 38: Line 39:
   * Logs are very useful   * Logs are very useful
 =====Error correction===== =====Error correction=====
-  * Meraculous requires error correction and adapter removal. Trimming is unnecessary. +  * Meraculous requires error correction and adapter removal. Trimming is unnecessary, as low quality reads are ignored during contig formation
-  * High error rates stop the assembler. Need to be removed.+  * High error rates bog down the assembler. Need to be removed.
   * Kmer size chosen directly affects assembly quality   * Kmer size chosen directly affects assembly quality
 =====KamerGenie===== =====KamerGenie=====
lecture_notes/04-20-2015.1429930733.txt.gz · Last modified: 2015/04/24 19:58 by calef