User Tools

Site Tools


lecture_notes:04-22-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
lecture_notes:04-22-2011 [2011/04/25 22:01]
eyliaw
lecture_notes:04-22-2011 [2011/06/10 00:16]
eyliaw [Searching]
Line 1: Line 1:
-====== Burrows Wheeler ​Transform ​====== +====== Burrows Wheeler ​Aligner ​====== 
-Incomplete.+We discussed the [[bioinformatic_tools:​bwa]]. ​ It uses the Burrows Wheeler Transform to represent a prefix trie, allowing for short read alignment with mismatches and gaps.
  
 ===== The prefix trie ====== ===== The prefix trie ======
-The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root.  Each node is the range in the suffix array where the +The prefix trie is a tree built from possible prefixes, starting at the end of a branch and going backwards to the root.  Each node is the range in the suffix array where that prefix may be found. ​ Each edge is a character in the prefix. ​ A string can be found--allowing for mismatches and gaps--in the prefix tree by going up each branch and comparing characters to the prefix until the all characters have been matched or the mismatch limit is met.
  
 ===== Suffix array ====== ===== Suffix array ======
-The suffix array is found by sorting the cyclic transformation lexicographically and taking their original indices. +The suffix array is found by appending an end character and sorting the cyclic transformation ​of the resulting string ​lexicographically and storing the original indices. 
-Range in the suffix array can be found by the R_ and R-+ 
 +Example: 
 +   ​GOOGOL 
 +Append end character:​ 
 +   X = GOOGOL$ 
 +Cyclic transformation:​ 
 +   0 GOOGOL$ 
 +   1 OOGOL$G 
 +   2 OGOL$GO 
 +   3 GOL$GOO 
 +   4 OL$GOOG 
 +   5 L$GOOGO 
 +   6 $GOOGOL 
 +Sort: 
 +   6 $GOOGOL 
 +   3 GOL$GOO 
 +   0 GOOGOL$ 
 +   5 L$GOOGO 
 +   2 OGOL$GO 
 +   4 OL$GOOG 
 +   1 OOGOL$G 
 + 
 +   S(i) = [6,​3,​0,​5,​2,​4,​1] 
 +The Burrows Wheeler transform takes the last character of the sorted cyclic strings: 
 +   B(i) = LO$OOGG 
 +===== Searching ===== 
 +We can use the FM-index to find the upper and lower bounds of a substring. 
 + 
 +C_X(a) is the number of characters lexicographically before a in X. 
 +   G 0 
 +   L 2 
 +   O 3 
 +O_X(a,i) is the number of occurrences of a in B_X[0,i]. 
 +^ i ^ B_X[0,i] ^ O_X(G,i) ^ O_X(L,i) ^ O_X(O,i) ^ 
 +| 1 | L        | 0        | 1        | 0        |  
 +| 2 | LO       | 0        | 1        | 1        |  
 +| 3 | LO$      | 0        | 1        | 1        |  
 +| 4 | LO$O     | 0        | 1        | 2        |  
 +| 5 | LO$OO    | 0        | 1        | 3        |  
 +| 6 | LO$OOG ​  | 1        | 1        | 3        |  
 +| 7 | LO$OOGG ​ | 2        | 1        | 3        |  
 + 
 +There are then two recursive formulas to find the start and end positions of a substring. 
 +   End R_(aW) = C(a) + O(a,R_(W) - 1) + 1 or 1 if W is the empty string 
 +   ​Start ​R-(aW) = C(a) + O(a,R-(W)) or (n - 1) if W is the empty string 
 +Where a is the first character and W is the word. 
 + 
 +To find the end position of the substring GO in 
lecture_notes/04-22-2011.txt · Last modified: 2015/08/09 23:06 by 212.129.31.47