This is an old revision of the document!
We discussed the bwa. It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps.
The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root. Each node is the range in the suffix array where that prefix may be found. Each edge is a character in the prefix. A string can be found–allowing for mismatches and gaps–in the prefix tree by going up each branch and comparing characters to the prefix until the all characters have been matched or the mismatch limit is met.
The suffix array is found by appending an end character and sorting the cyclic transformation of the resulting string lexicographically and storing the original indices.
Example:
GOOGOL
Append end character:
GOOGOL$
Cyclic transformation:
0 GOOGOL$ 1 OOGOL$G 2 OGOL$GO 3 GOL$GOO 4 OL$GOOG 5 L$GOOGO 6 $GOOGOL
Sort:
6 $GOOGO**L** 3 GOL$GO**O** 0 GOOGOL**$** 5 L$GOOG**O** 2 OGOL$G**O** 4 OL$GOO**G** 1 OOGOL$**G**
S(i) = [6,3,0,5,2,4,1]
The Burrows Wheeler transform takes the last character of the sorted cyclic strings:
B(i) = LO$OOGG