This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
lecture_notes:04-22-2011 [2011/06/08 17:14] eyliaw [Suffix array] |
lecture_notes:04-22-2011 [2011/06/09 23:39] eyliaw [Searching] |
||
---|---|---|---|
Line 11: | Line 11: | ||
GOOGOL | GOOGOL | ||
Append end character: | Append end character: | ||
- | GOOGOL$ | + | X = GOOGOL$ |
Cyclic transformation: | Cyclic transformation: | ||
0 GOOGOL$ | 0 GOOGOL$ | ||
Line 21: | Line 21: | ||
6 $GOOGOL | 6 $GOOGOL | ||
Sort: | Sort: | ||
- | 6 $GOOGO**L** | + | 6 $GOOGOL |
- | 3 GOL$GO**O** | + | 3 GOL$GOO |
- | 0 GOOGOL**$** | + | 0 GOOGOL$ |
- | 5 L$GOOG**O** | + | 5 L$GOOGO |
- | 2 OGOL$G**O** | + | 2 OGOL$GO |
- | 4 OL$GOO**G** | + | 4 OL$GOOG |
- | 1 OOGOL$**G** | + | 1 OOGOL$G |
S(i) = [6,3,0,5,2,4,1] | S(i) = [6,3,0,5,2,4,1] | ||
The Burrows Wheeler transform takes the last character of the sorted cyclic strings: | The Burrows Wheeler transform takes the last character of the sorted cyclic strings: | ||
B(i) = LO$OOGG | B(i) = LO$OOGG | ||
+ | ===== Searching ===== | ||
+ | We can use the FM-index to find the upper and lower bounds of a substring. | ||
+ | C_X(a) is the number of characters lexicographically before a in X. | ||
+ | $ 0 | ||
+ | G 1 | ||
+ | L 3 | ||
+ | O 4 | ||
+ | O_X(a,i) is the number of occurrences of a in B_X[0,i]. | ||
+ | ^ B_X[0,i] ^ O_X(G,i) ^ O_X(L,i) ^ O_X(O,i) ^ | ||
+ | | 1 L | 0 | 1 | 0 | | ||
+ | | 2 LO | 0 | 1 | 1 | | ||
+ | | 3 LO$ | 0 | 1 | 1 | | ||
+ | | 4 LO$O | 0 | 1 | 2 | | ||
+ | | 5 LO$OO | 0 | 1 | 3 | | ||
+ | | 6 LO$OOG | 1 | 1 | 3 | | ||
+ | | 7 LO$OOGG | 2 | 1 | 3 | | ||
+ | |||
+ | |