User Tools

Site Tools


lecture_notes:04-22-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-22-2011 [2011/04/25 22:01]
eyliaw
lecture_notes:04-22-2011 [2015/08/09 23:06]
212.129.31.47 ↷ Links adapted because of a move operation
Line 1: Line 1:
-====== Burrows Wheeler ​Transform ​====== +====== Burrows Wheeler ​Aligner ​====== 
-Incomplete.+We discussed the [[archive:​bioinformatic_tools:​bwa]]. ​ It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps.
  
 ===== The prefix trie ====== ===== The prefix trie ======
-The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root.  Each node is the range in the suffix array where the +The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root.  Each node is the range in the suffix array where that prefix may be found. ​ Each edge is a character in the prefix. ​ A string can be found--allowing for mismatches and gaps--in the prefix tree by going up each branch and comparing characters to the prefix until the all characters have been matched or the mismatch limit is met.
  
 ===== Suffix array ====== ===== Suffix array ======
-The suffix array is found by sorting the cyclic transformation lexicographically and taking their original indices. +The suffix array is found by appending an end character and sorting the cyclic transformation ​of the resulting string ​lexicographically and storing the original indices. 
-Range in the suffix array can be found by the R_ and R-+ 
 +Example: 
 +   ​GOOGOL 
 +Append end character:​ 
 +   X = GOOGOL$ 
 +Cyclic transformation:​ 
 +   0 GOOGOL$ 
 +   1 OOGOL$G 
 +   2 OGOL$GO 
 +   3 GOL$GOO 
 +   4 OL$GOOG 
 +   5 L$GOOGO 
 +   6 $GOOGOL 
 +Sort: 
 +   6 $GOOGOL 
 +   3 GOL$GOO 
 +   0 GOOGOL$ 
 +   5 L$GOOGO 
 +   2 OGOL$GO 
 +   4 OL$GOOG 
 +   1 OOGOL$G 
 + 
 +   S(i) = [6,​3,​0,​5,​2,​4,​1] 
 +The Burrows Wheeler transform takes the last character of the sorted cyclic strings: 
 +   B(i) = LO$OOGG 
 +===== Searching ===== 
 +We can use the FM-index to find the upper and lower bounds of a substring. 
 + 
 +C_X(a) is the number of characters lexicographically before a in X. 
 +   G 0 
 +   L 2 
 +   O 3 
 +O_X(a,i) is the number of occurrences of a in B_X[0,i]. 
 +^ i ^ B_X[0,i] ^ O_X(G,i) ^ O_X(L,i) ^ O_X(O,i) ^ 
 +| 1 | L        | 0        | 1        | 0        |  
 +| 2 | LO       | 0        | 1        | 1        |  
 +| 3 | LO$      | 0        | 1        | 1        |  
 +| 4 | LO$O     | 0        | 1        | 2        |  
 +| 5 | LO$OO    | 0        | 1        | 3        |  
 +| 6 | LO$OOG ​  | 1        | 1        | 3        |  
 +| 7 | LO$OOGG ​ | 2        | 1        | 3        |  
 + 
 +There are then two recursive formulas to find the start and end positions of a substring. 
 +   End R_(aW) = C(a) + O(a,R_(W) - 1) + 1 or 1 if W is the empty string 
 +   ​Start ​R-(aW) = C(a) + O(a,R-(W)) or (n - 1) if W is the empty string 
 +Where a is the first character and W is the word.
lecture_notes/04-22-2011.txt · Last modified: 2015/08/09 23:06 by 212.129.31.47