This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
lecture_notes:04-22-2011 [2011/04/25 22:01] eyliaw created |
lecture_notes:04-22-2011 [2011/06/08 17:13] eyliaw [Suffix array] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Burrows Wheeler Transform ====== | + | ====== Burrows Wheeler Aligner ====== |
+ | We discussed the [[bioinformatic_tools:bwa]]. It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps. | ||
===== The prefix trie ====== | ===== The prefix trie ====== | ||
- | The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root. Each node is the range in the suffix array where the | + | The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root. Each node is the range in the suffix array where that prefix may be found. Each edge is a character in the prefix. A string can be found--allowing for mismatches and gaps--in the prefix tree by going up each branch and comparing characters to the prefix until the all characters have been matched or the mismatch limit is met. |
===== Suffix array ====== | ===== Suffix array ====== | ||
- | The suffix array is found by sorting the cyclic transformation lexicographically and taking their original indices. | + | The suffix array is found by appending an end character and sorting the cyclic transformation of the resulting string lexicographically and storing the original indices. |
- | Range in the suffix array can be found by the R_ and R- | + | |
+ | Example: | ||
+ | GOOGOL | ||
+ | Append end character: | ||
+ | GOOGOL$ | ||
+ | Cyclic transformation: | ||
+ | 0 GOOGOL$ | ||
+ | 1 OOGOL$G | ||
+ | 2 OGOL$GO | ||
+ | 3 GOL$GOO | ||
+ | 4 OL$GOOG | ||
+ | 5 L$GOOGO | ||
+ | 6 $GOOGOL | ||
+ | Sort: | ||
+ | 6 $GOOGOL | ||
+ | 3 GOL$GOO | ||
+ | 0 GOOGOL$ | ||
+ | 5 L$GOOGO | ||
+ | 2 OGOL$GO | ||
+ | 4 OL$GOOG | ||
+ | 1 OOGOL$G | ||
+ | |||
+ | S(i) = [6,3,0,5,2,4,1] | ||
+ | The Burrows Wheeler transform takes the last character of the sorted cyclic strings: | ||
+ | B(i) = LO$OOGG |