User Tools

Site Tools


lecture_notes:05-19-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
lecture_notes:05-19-2010 [2010/05/19 21:59]
hyjkim
lecture_notes:05-19-2010 [2010/05/19 22:14]
hyjkim P
Line 1: Line 1:
 =====ALLPATHS===== =====ALLPATHS=====
 +===Presented by Thomas===
   * ALLPATHS was created to improve reference genomes.   * ALLPATHS was created to improve reference genomes.
   * The version described here is optimized for 100 bases (illumina reads).   * The version described here is optimized for 100 bases (illumina reads).
Line 55: Line 56:
     * Beats velvet and euler-sr in all categories. (Measured in 10kb windows).     * Beats velvet and euler-sr in all categories. (Measured in 10kb windows).
  
-====Mira====+=====Mira====
 +===Presented by Michael Cusack=== 
 +  * Designed to work with difficult genomes (lots of repeats or other sequence aberrations) 
 +  * Hybrid Assembly 
 +    * Can combine serveral data types 
 +      * Does not work with SOLiD 
 +      * Can take trace data from sanger in addition to base calls 
 +      * Position specific confidence blues 
 +      * A strech in each sequence marked as high confidence regions 
 +      * General properties such as directionality 
 +  * Mira is an Iterative Process 
 +    * Read Scanning with a fast error tolerant pair-wise comparison. (Both less sensitive than smith-waterman) 
 +      * DNA-Shift-AND 
 +        * Align small words within a read 
 +        * O(c*n), c=# allowed errors 
 +        * Must find 2 of 3 words to establish a relationship 
 +      * Zebra 
 +        * Transcribe, Divide, Reorganize, Concentrate and Conquer strategy 
 +        * Hashes each octet of bases into a 16-bit int and creates a hash-index table. 
 +    * More thorough comparison oto establish type of relationship 
 +      * Uses a modified smith-waterman alignment. 
 +      * uses banding 
 +      * uses information generated from DNA-SAND/​ZEBRA 
 +    * Building graph  
 +      * Overlap alignment + complementary data (orientation,​ overlap region etc.) 
 +    * Iterative Processing 
 +      * Start with highest quality. 
 +        * Split each read into high confidence and low confidence regions by quality clipping. 
 +        * Only high confidence regions are used to build initial contigs. 
 +        * Low confidence regions are used "​cautiously"​ 
 +    * Creating Contigs 
 +      * Pathfinder 
 +        * Finds best nodes and uses them as anchors 
 +        * Extens while minimizing uncertainties of consensus bases 
 +        * Uses a n, m-step recursive look-ahead algorithm to detech repeats. 
 +      * Contig Builder 
 +        * Once a path is decided, each contig must be compiled and approved 
 +        * If a read is too different for existing consensus, depsite a high scoring overlat, it is regejected and the pathfinder is run again from that point. 
 +    * Independent Observations 
 +      * "Once central pillar of the quality calculation in MIRA is the rule that independent observations of  base confirm this base better than non-independent observations. When a base was read from both directions, one can assume independence of observations:​ it's not the whole truth, but close enough. As a side note: observing a..." something... 
 +    * Handing repeats 
 +      * Can take in known repetitive elements. 
 +        * When these reads are detected, much stricter control mechanisms can be applied. 
 +      * When there is a discrepancy in a read matching a repeated element, signal processing of the trace is used to determin if the error is explainable 
 +      * if percentage of unexplainable errors is greater than a threshold(default:​ 1%), reads are rejceted from consensus and returned to assembly graph.
lecture_notes/05-19-2010.txt · Last modified: 2010/05/19 22:14 by hyjkim