This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
lecture_notes:04-22-2011 [2011/06/08 17:01] eyliaw |
lecture_notes:04-22-2011 [2011/06/08 17:29] eyliaw [Suffix array] |
||
---|---|---|---|
Line 6: | Line 6: | ||
===== Suffix array ====== | ===== Suffix array ====== | ||
- | The suffix array is found by sorting the cyclic transformation lexicographically and taking their original indices. | + | The suffix array is found by appending an end character and sorting the cyclic transformation of the resulting string lexicographically and storing the original indices. |
+ | |||
+ | Example: | ||
+ | GOOGOL | ||
+ | Append end character: | ||
+ | X = GOOGOL$ | ||
+ | Cyclic transformation: | ||
+ | 0 GOOGOL$ | ||
+ | 1 OOGOL$G | ||
+ | 2 OGOL$GO | ||
+ | 3 GOL$GOO | ||
+ | 4 OL$GOOG | ||
+ | 5 L$GOOGO | ||
+ | 6 $GOOGOL | ||
+ | Sort: | ||
+ | 6 $GOOGO**L** | ||
+ | 3 GOL$GO**O** | ||
+ | 0 GOOGOL**$** | ||
+ | 5 L$GOOG**O** | ||
+ | 2 OGOL$G**O** | ||
+ | 4 OL$GOO**G** | ||
+ | 1 OOGOL$**G** | ||
+ | |||
+ | S(i) = [6,3,0,5,2,4,1] | ||
+ | The Burrows Wheeler transform takes the last character of the sorted cyclic strings: | ||
+ | B(i) = LO$OOGG | ||
+ | ===== Searching ===== | ||
+ | We can use the FM-index to find the upper and lower bounds of a substring. | ||
+ | |||
+ | C_X(a) is the number of characters lexicographically before a in X. | ||
+ | $ 0 | ||
+ | G 1 | ||
+ | L 3 | ||
+ | O 4 | ||
+ | O_X(a,i) is the number of occurrences of a in B_X[0,i]. |