User Tools

Site Tools


lecture_notes:04-22-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
lecture_notes:04-22-2011 [2011/04/25 22:01]
eyliaw created
lecture_notes:04-22-2011 [2011/06/09 23:38]
eyliaw
Line 1: Line 1:
-====== Burrows Wheeler ​Transform ​======+====== Burrows Wheeler ​Aligner ​====== 
 +We discussed the [[bioinformatic_tools:​bwa]]. ​ It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps. 
 ===== The prefix trie ====== ===== The prefix trie ======
-The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root.  Each node is the range in the suffix array where the +The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root.  Each node is the range in the suffix array where that prefix may be found. ​ Each edge is a character in the prefix. ​ A string can be found--allowing for mismatches and gaps--in the prefix tree by going up each branch and comparing characters to the prefix until the all characters have been matched or the mismatch limit is met.
  
 ===== Suffix array ====== ===== Suffix array ======
-The suffix array is found by sorting the cyclic transformation lexicographically and taking their original indices. +The suffix array is found by appending an end character and sorting the cyclic transformation ​of the resulting string ​lexicographically and storing the original indices. 
-Range in the suffix array can be found by the R_ and R-+ 
 +Example: 
 +   ​GOOGOL 
 +Append end character:​ 
 +   X = GOOGOL$ 
 +Cyclic transformation:​ 
 +   0 GOOGOL$ 
 +   1 OOGOL$G 
 +   2 OGOL$GO 
 +   3 GOL$GOO 
 +   4 OL$GOOG 
 +   5 L$GOOGO 
 +   6 $GOOGOL 
 +Sort: 
 +   6 $GOOGOL 
 +   3 GOL$GOO 
 +   0 GOOGOL$ 
 +   5 L$GOOGO 
 +   2 OGOL$GO 
 +   4 OL$GOOG 
 +   1 OOGOL$G 
 + 
 +   S(i) = [6,​3,​0,​5,​2,​4,​1] 
 +The Burrows Wheeler transform takes the last character of the sorted cyclic strings: 
 +   B(i) = LO$OOGG 
 +===== Searching ===== 
 +We can use the FM-index to find the upper and lower bounds of a substring. 
 + 
 +C_X(a) is the number of characters lexicographically before a in X. 
 +   $ 0 
 +   G 1 
 +   L 3 
 +   O 4 
 +O_X(a,i) is the number of occurrences of a in B_X[0,i]. 
 +^ B_X[0,i]: ^ O_X(G,i): ^ O_X(L,i) ^ O_X(O,i) 
 +| 1 L       | 0         | 1        | 0 
 +| 2 LO      | 0         | 1        | 1 
 +| 3 LO$     | 0         | 1        | 1 
 +| 4 LO$O    | 0         | 1        | 2 
 +| 5 LO$OO   | 0         | 1        | 3 
 +| 6 LO$OOG ​ | 1         | 1        | 3 
 +| 7 LO$OOGG | 2         | 1        | 3 
 +    
 +    ​
lecture_notes/04-22-2011.txt · Last modified: 2015/08/09 23:06 by 212.129.31.47