User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:03-30-2011 [2011/03/31 20:56]
svohr [Ion Torrent]
lecture_notes:03-30-2011 [2011/04/01 12:20] (current)
svohr [Coverage] slight corrections
Line 32: Line 32:
   * High error rates (~5%)   * High error rates (~5%)
   * Useful when mapping to a reference.   * Useful when mapping to a reference.
 +===== Coverage =====
 +We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''​i''​ in a single read ''​j''​.
 +  ​
 +  P( seeing base i in read j ) = L/G
 +where ''​L''​ is the read length and ''​G''​ is the total size of the genome. If we have ''​R''​ reads, then 
 +  P( never seeing base i ) = (1 - L/G)^R
 +We can multiply ''​L/​G''​ by ''​R/​R''​ to get ''​((L*R) / G) / R''​ or ''​C / R''​ where ''​C''​ is our coverage of the genome. We take the limit of this as
 +''​R''​ goes to infinity:
 +  lim n->inf (1 - C/R)^R = e^-C
 +Thus we can expect to miss ''​G*e^-C''​ bases.
 +We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later
 +assemble them into //​scaffolds//​ using other information,​ such as long distance physical maps.
 ===== References ===== ===== References =====
 <​refnotes>​notes-separator:​ none</​refnotes>​ <​refnotes>​notes-separator:​ none</​refnotes>​
 ~~REFNOTES cite~~ ~~REFNOTES cite~~
lecture_notes/03-30-2011.1301630162.txt.gz ยท Last modified: 2011/03/31 20:56 by svohr