Differences

This shows you the differences between two versions of the page.

--- lecture_notes:04-20-2015 [2015/04/25 02:55]
calef [Basic features]
+++ lecture_notes:04-20-2015 [2015/04/25 03:03]
calef [User experience]
@@ Line 5: / Line 5: @@
   * Counts occurrences of each kmer in the data set.
-  * Removes kmers whose frequency are below the user threshold.
+  * Removes kmers whose frequency are below a threshold provided by the user.
   * For each kmer, counts the number of high-quality single-base extensions
-  * Classifies the 5' and 3' ends of each kmer
+  * Classifies the 5' and 3' ends of each kmer as U, F, or X, corresponding to having zero, one, or multiple high-quality single-base extensions
   * Stores the extensions of kmers with a classification in a hash
-  * Removes non-reciprocal linkages between kmers
+  * Removes non-reciprocal U-U extensions between kmers (i.e. an extension where the end of one mer is marked as U but the other is marked F).
+  * Stores the linear subgraph of U-U extensions
   * Selects kmers at random and extend outwards to produce contigs
   * Aligns all reads to contigs via BLAST
   * Assembles contigs into scaffolds using paired-end data
   * Searches unaligned reads as potential gap-closers using mate-pair data
 =====Meraculous limitations=====
-  * The assembler relies on data with high quality in order to avoid error correction
+  * The assembler relies on data with high quality in order to avoid error correction, also requires high coverage
   * Initial release did not support polyploid genome assembly due to allowing for linear subgraphs of the de Bruijn graph only
-  * Low memory footprint
+  * High disk space usage
 =====User experience=====
   * Requires an array of other scripts in other languages
   * Most of high level scripts are written in perl
-  * Tested the program in small dataset and obtained contigs
+  * Runs from a shell script and a user-provided config file
+  * SGE-aware, handles qsub and monitoring jobs
+  * Pipeline is well sub-divided, running the program produces intermediate files and executables allowing the user to suspend, resume, or restart the run from any step in the pipeline
+  * Thorough error logging for each step in the algorithm
+  * Tested the program with the packaged test data and obtained contigs
 =====Installation=====
-  * Main issue was get all dependencies together
+  * Main issue was new version of GCC and getting all the dependencies together ~16 hrs
-  * There was one non-standard perl mode needed
+  * There was one non-standard perl module needed
+  * Files with carriage returns
   * Some scripts contain error but they aren't hard to fix.
 =====Running Meraculous=====

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools