=====ALLPATHS=====
===Presented by Thomas===
  * ALLPATHS was created to improve reference genomes.
  * The version described here is optimized for 100 bases (illumina reads).
  * Does paired-end.
  * Requires high coverage 40x+ raw read coverage for each library.
  * A minimum of 2 paired-end libraries: one short and one long
    * The short separation size must be less than twice the read size.
    * The distribution of siezes should be as small as possible, with a std dev of < 20%.
    * Long library insert size should be approximtely 4000 bases long and can have a larger size distribution
  * Installation
    * Requires Boost libraries and an up-to-date c compiler
    * Very long installation, over 2 hours of compilation time.
    * Download and extract the tarball
    * autoconf
    * ./configure
    * make -j8 (parallel compilation)
    * make install scripts
  * Pipeline/Modules
    * All binaries are located in /bin
    * RunAllpaths3g controls the entire pipeline.
    * Directories are created for each new job so different assemblies can be compared.
      * Reference
        * Contains the reference genome
      * Data
        * reads fasta, qual, and pairs files. 
        * May contain many run directories, each representing a particular attempt to assemble the original data using a different set of parameters.
      * Run
        * Intermediate files.
      * Assemblies
        * finished assemblies are stored in this directory.
      * SubDir
      * OptionsFile
        * There are many options. 
  * Preparing read data
    * ploidy file: 1 for haploid, 2 for diploid
    * Fragment library reads are expected to be oriented towards each other.
    * Jumping library reads away from each other.
  * difference in v1 and v2 ALLPATHS
    * v1: high quality assemblies from simulated shor treads
    * v2: high quality assemblies can be optained from read data
      * beat Velvet and Euler-SR
  * Input: 
    * Three different bacteria
      * S. Aureus
  * Output
    * A graph of continuous paths.
      * Shows paths between contigs. 
      * Each component is its own scaffold
  * Some other things
    * Removal of reads that are >90% A. claims to be an artifact of the illumina sequencing platform.
  * Runtime
    * Scales almost linearly according to genome size. E.coli ~8.2h.
    * May be too slow for large genomes (> 1gb)
  * Error rate:
    * Beats velvet and euler-sr in all categories. (Measured in 10kb windows).

=====Mira=====
===Presented by Michael Cusack===
  * Designed to work with difficult genomes (lots of repeats or other sequence aberrations)
  * Hybrid Assembly
    * Can combine serveral data types
      * Does not work with SOLiD
      * Can take trace data from sanger in addition to base calls
      * Position specific confidence blues
      * A strech in each sequence marked as high confidence regions
      * General properties such as directionality
  * Mira is an Iterative Process
    * Read Scanning with a fast error tolerant pair-wise comparison. (Both less sensitive than smith-waterman)
      * DNA-Shift-AND
        * Align small words within a read
        * O(c*n), c=# allowed errors
        * Must find 2 of 3 words to establish a relationship
      * Zebra
        * Transcribe, Divide, Reorganize, Concentrate and Conquer strategy
        * Hashes each octet of bases into a 16-bit int and creates a hash-index table.
    * More thorough comparison oto establish type of relationship
      * Uses a modified smith-waterman alignment.
      * uses banding
      * uses information generated from DNA-SAND/ZEBRA
    * Building graph 
      * Overlap alignment + complementary data (orientation, overlap region etc.)
    * Iterative Processing
      * Start with highest quality.
        * Split each read into high confidence and low confidence regions by quality clipping.
        * Only high confidence regions are used to build initial contigs.
        * Low confidence regions are used "cautiously"
    * Creating Contigs
      * Pathfinder
        * Finds best nodes and uses them as anchors
        * Extens while minimizing uncertainties of consensus bases
        * Uses a n, m-step recursive look-ahead algorithm to detech repeats.
      * Contig Builder
        * Once a path is decided, each contig must be compiled and approved
        * If a read is too different for existing consensus, depsite a high scoring overlat, it is regejected and the pathfinder is run again from that point.
    * Independent Observations
      * "Once central pillar of the quality calculation in MIRA is the rule that independent observations of  base confirm this base better than non-independent observations. When a base was read from both directions, one can assume independence of observations: it's not the whole truth, but close enough. As a side note: observing a..." something...
    * Handing repeats
      * Can take in known repetitive elements.
        * When these reads are detected, much stricter control mechanisms can be applied.
      * When there is a discrepancy in a read matching a repeated element, signal processing of the trace is used to determin if the error is explainable
      * if percentage of unexplainable errors is greater than a threshold(default: 1%), reads are rejceted from consensus and returned to assembly graph.