This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |||
|
lecture_notes:05-19-2010 [2010/05/19 21:59] hyjkim |
lecture_notes:05-19-2010 [2010/05/19 22:14] (current) hyjkim P |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| =====ALLPATHS===== | =====ALLPATHS===== | ||
| + | ===Presented by Thomas=== | ||
| * ALLPATHS was created to improve reference genomes. | * ALLPATHS was created to improve reference genomes. | ||
| * The version described here is optimized for 100 bases (illumina reads). | * The version described here is optimized for 100 bases (illumina reads). | ||
| Line 55: | Line 56: | ||
| * Beats velvet and euler-sr in all categories. (Measured in 10kb windows). | * Beats velvet and euler-sr in all categories. (Measured in 10kb windows). | ||
| - | ====Mira==== | + | =====Mira===== |
| + | ===Presented by Michael Cusack=== | ||
| + | * Designed to work with difficult genomes (lots of repeats or other sequence aberrations) | ||
| + | * Hybrid Assembly | ||
| + | * Can combine serveral data types | ||
| + | * Does not work with SOLiD | ||
| + | * Can take trace data from sanger in addition to base calls | ||
| + | * Position specific confidence blues | ||
| + | * A strech in each sequence marked as high confidence regions | ||
| + | * General properties such as directionality | ||
| + | * Mira is an Iterative Process | ||
| + | * Read Scanning with a fast error tolerant pair-wise comparison. (Both less sensitive than smith-waterman) | ||
| + | * DNA-Shift-AND | ||
| + | * Align small words within a read | ||
| + | * O(c*n), c=# allowed errors | ||
| + | * Must find 2 of 3 words to establish a relationship | ||
| + | * Zebra | ||
| + | * Transcribe, Divide, Reorganize, Concentrate and Conquer strategy | ||
| + | * Hashes each octet of bases into a 16-bit int and creates a hash-index table. | ||
| + | * More thorough comparison oto establish type of relationship | ||
| + | * Uses a modified smith-waterman alignment. | ||
| + | * uses banding | ||
| + | * uses information generated from DNA-SAND/ZEBRA | ||
| + | * Building graph | ||
| + | * Overlap alignment + complementary data (orientation, overlap region etc.) | ||
| + | * Iterative Processing | ||
| + | * Start with highest quality. | ||
| + | * Split each read into high confidence and low confidence regions by quality clipping. | ||
| + | * Only high confidence regions are used to build initial contigs. | ||
| + | * Low confidence regions are used "cautiously" | ||
| + | * Creating Contigs | ||
| + | * Pathfinder | ||
| + | * Finds best nodes and uses them as anchors | ||
| + | * Extens while minimizing uncertainties of consensus bases | ||
| + | * Uses a n, m-step recursive look-ahead algorithm to detech repeats. | ||
| + | * Contig Builder | ||
| + | * Once a path is decided, each contig must be compiled and approved | ||
| + | * If a read is too different for existing consensus, depsite a high scoring overlat, it is regejected and the pathfinder is run again from that point. | ||
| + | * Independent Observations | ||
| + | * "Once central pillar of the quality calculation in MIRA is the rule that independent observations of base confirm this base better than non-independent observations. When a base was read from both directions, one can assume independence of observations: it's not the whole truth, but close enough. As a side note: observing a..." something... | ||
| + | * Handing repeats | ||
| + | * Can take in known repetitive elements. | ||
| + | * When these reads are detected, much stricter control mechanisms can be applied. | ||
| + | * When there is a discrepancy in a read matching a repeated element, signal processing of the trace is used to determin if the error is explainable | ||
| + | * if percentage of unexplainable errors is greater than a threshold(default: 1%), reads are rejceted from consensus and returned to assembly graph. | ||