This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
lecture_notes:05-19-2010 [2010/05/19 21:59] hyjkim |
lecture_notes:05-19-2010 [2010/05/19 22:14] (current) hyjkim P |
||
---|---|---|---|
Line 1: | Line 1: | ||
=====ALLPATHS===== | =====ALLPATHS===== | ||
+ | ===Presented by Thomas=== | ||
* ALLPATHS was created to improve reference genomes. | * ALLPATHS was created to improve reference genomes. | ||
* The version described here is optimized for 100 bases (illumina reads). | * The version described here is optimized for 100 bases (illumina reads). | ||
Line 55: | Line 56: | ||
* Beats velvet and euler-sr in all categories. (Measured in 10kb windows). | * Beats velvet and euler-sr in all categories. (Measured in 10kb windows). | ||
- | ====Mira==== | + | =====Mira===== |
+ | ===Presented by Michael Cusack=== | ||
+ | * Designed to work with difficult genomes (lots of repeats or other sequence aberrations) | ||
+ | * Hybrid Assembly | ||
+ | * Can combine serveral data types | ||
+ | * Does not work with SOLiD | ||
+ | * Can take trace data from sanger in addition to base calls | ||
+ | * Position specific confidence blues | ||
+ | * A strech in each sequence marked as high confidence regions | ||
+ | * General properties such as directionality | ||
+ | * Mira is an Iterative Process | ||
+ | * Read Scanning with a fast error tolerant pair-wise comparison. (Both less sensitive than smith-waterman) | ||
+ | * DNA-Shift-AND | ||
+ | * Align small words within a read | ||
+ | * O(c*n), c=# allowed errors | ||
+ | * Must find 2 of 3 words to establish a relationship | ||
+ | * Zebra | ||
+ | * Transcribe, Divide, Reorganize, Concentrate and Conquer strategy | ||
+ | * Hashes each octet of bases into a 16-bit int and creates a hash-index table. | ||
+ | * More thorough comparison oto establish type of relationship | ||
+ | * Uses a modified smith-waterman alignment. | ||
+ | * uses banding | ||
+ | * uses information generated from DNA-SAND/ZEBRA | ||
+ | * Building graph | ||
+ | * Overlap alignment + complementary data (orientation, overlap region etc.) | ||
+ | * Iterative Processing | ||
+ | * Start with highest quality. | ||
+ | * Split each read into high confidence and low confidence regions by quality clipping. | ||
+ | * Only high confidence regions are used to build initial contigs. | ||
+ | * Low confidence regions are used "cautiously" | ||
+ | * Creating Contigs | ||
+ | * Pathfinder | ||
+ | * Finds best nodes and uses them as anchors | ||
+ | * Extens while minimizing uncertainties of consensus bases | ||
+ | * Uses a n, m-step recursive look-ahead algorithm to detech repeats. | ||
+ | * Contig Builder | ||
+ | * Once a path is decided, each contig must be compiled and approved | ||
+ | * If a read is too different for existing consensus, depsite a high scoring overlat, it is regejected and the pathfinder is run again from that point. | ||
+ | * Independent Observations | ||
+ | * "Once central pillar of the quality calculation in MIRA is the rule that independent observations of base confirm this base better than non-independent observations. When a base was read from both directions, one can assume independence of observations: it's not the whole truth, but close enough. As a side note: observing a..." something... | ||
+ | * Handing repeats | ||
+ | * Can take in known repetitive elements. | ||
+ | * When these reads are detected, much stricter control mechanisms can be applied. | ||
+ | * When there is a discrepancy in a read matching a repeated element, signal processing of the trace is used to determin if the error is explainable | ||
+ | * if percentage of unexplainable errors is greater than a threshold(default: 1%), reads are rejceted from consensus and returned to assembly graph. |