User Tools

Site Tools


lecture_notes:04-22-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-22-2011 [2011/06/08 17:14]
eyliaw [Suffix array]
lecture_notes:04-22-2011 [2015/08/09 23:06] (current)
212.129.31.47 ↷ Links adapted because of a move operation
Line 1: Line 1:
 ====== Burrows Wheeler Aligner ====== ====== Burrows Wheeler Aligner ======
-We discussed the [[bioinformatic_tools:​bwa]]. ​ It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps.+We discussed the [[archive:bioinformatic_tools:​bwa]]. ​ It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps.
  
 ===== The prefix trie ====== ===== The prefix trie ======
Line 11: Line 11:
    ​GOOGOL    ​GOOGOL
 Append end character: Append end character:
-   ​GOOGOL$+   X = GOOGOL$
 Cyclic transformation:​ Cyclic transformation:​
    0 GOOGOL$    0 GOOGOL$
Line 21: Line 21:
    6 $GOOGOL    6 $GOOGOL
 Sort: Sort:
-   6 $GOOGO**L** +   6 $GOOGOL 
-   3 GOL$GO**O** +   3 GOL$GOO 
-   0 GOOGOL**$** +   0 GOOGOL$ 
-   5 L$GOOG**O** +   5 L$GOOGO 
-   2 OGOL$G**O** +   2 OGOL$GO 
-   4 OL$GOO**G** +   4 OL$GOOG 
-   1 OOGOL$**G**+   1 OOGOL$G
  
    S(i) = [6,​3,​0,​5,​2,​4,​1]    S(i) = [6,​3,​0,​5,​2,​4,​1]
 The Burrows Wheeler transform takes the last character of the sorted cyclic strings: The Burrows Wheeler transform takes the last character of the sorted cyclic strings:
    B(i) = LO$OOGG    B(i) = LO$OOGG
 +===== Searching =====
 +We can use the FM-index to find the upper and lower bounds of a substring.
 +
 +C_X(a) is the number of characters lexicographically before a in X.
 +   G 0
 +   L 2
 +   O 3
 +O_X(a,i) is the number of occurrences of a in B_X[0,i].
 +^ i ^ B_X[0,i] ^ O_X(G,i) ^ O_X(L,i) ^ O_X(O,i) ^
 +| 1 | L        | 0        | 1        | 0        | 
 +| 2 | LO       | 0        | 1        | 1        | 
 +| 3 | LO$      | 0        | 1        | 1        | 
 +| 4 | LO$O     | 0        | 1        | 2        | 
 +| 5 | LO$OO    | 0        | 1        | 3        | 
 +| 6 | LO$OOG ​  | 1        | 1        | 3        | 
 +| 7 | LO$OOGG ​ | 2        | 1        | 3        | 
  
 +There are then two recursive formulas to find the start and end positions of a substring.
 +   End R_(aW) = C(a) + O(a,R_(W) - 1) + 1 or 1 if W is the empty string
 +   Start R-(aW) = C(a) + O(a,R-(W)) or (n - 1) if W is the empty string
 +Where a is the first character and W is the word.
lecture_notes/04-22-2011.1307553277.txt.gz · Last modified: 2011/06/08 17:14 by eyliaw