Differences

This shows you the differences between two versions of the page.

--- lecture_notes:05-19-2010 [2010/05/19 21:59]
hyjkim
+++ lecture_notes:05-19-2010 [2010/05/19 22:14] (current)
hyjkim P
@@ Line 1: / Line 1: @@
 =====ALLPATHS=====
+===Presented by Thomas===
   * ALLPATHS was created to improve reference genomes.
   * The version described here is optimized for 100 bases (illumina reads).
@@ Line 55: / Line 56: @@
     * Beats velvet and euler-sr in all categories. (Measured in 10kb windows).
-====Mira====
+=====Mira=====
+===Presented by Michael Cusack===
+  * Designed to work with difficult genomes (lots of repeats or other sequence aberrations)
+  * Hybrid Assembly
+    * Can combine serveral data types
+      * Does not work with SOLiD
+      * Can take trace data from sanger in addition to base calls
+      * Position specific confidence blues
+      * A strech in each sequence marked as high confidence regions
+      * General properties such as directionality
+  * Mira is an Iterative Process
+    * Read Scanning with a fast error tolerant pair-wise comparison. (Both less sensitive than smith-waterman)
+      * DNA-Shift-AND
+        * Align small words within a read
+        * O(c*n), c=# allowed errors
+        * Must find 2 of 3 words to establish a relationship
+      * Zebra
+        * Transcribe, Divide, Reorganize, Concentrate and Conquer strategy
+        * Hashes each octet of bases into a 16-bit int and creates a hash-index table.
+    * More thorough comparison oto establish type of relationship
+      * Uses a modified smith-waterman alignment.
+      * uses banding
+      * uses information generated from DNA-SAND/ZEBRA
+    * Building graph
+      * Overlap alignment + complementary data (orientation, overlap region etc.)
+    * Iterative Processing
+      * Start with highest quality.
+        * Split each read into high confidence and low confidence regions by quality clipping.
+        * Only high confidence regions are used to build initial contigs.
+        * Low confidence regions are used "cautiously"
+    * Creating Contigs
+      * Pathfinder
+        * Finds best nodes and uses them as anchors
+        * Extens while minimizing uncertainties of consensus bases
+        * Uses a n, m-step recursive look-ahead algorithm to detech repeats.
+      * Contig Builder
+        * Once a path is decided, each contig must be compiled and approved
+        * If a read is too different for existing consensus, depsite a high scoring overlat, it is regejected and the pathfinder is run again from that point.
+    * Independent Observations
+      * "Once central pillar of the quality calculation in MIRA is the rule that independent observations of  base confirm this base better than non-independent observations. When a base was read from both directions, one can assume independence of observations: it's not the whole truth, but close enough. As a side note: observing a..." something...
+    * Handing repeats
+      * Can take in known repetitive elements.
+        * When these reads are detected, much stricter control mechanisms can be applied.
+      * When there is a discrepancy in a read matching a repeated element, signal processing of the trace is used to determin if the error is explainable
+      * if percentage of unexplainable errors is greater than a threshold(default: 1%), reads are rejceted from consensus and returned to assembly graph.

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools