User Tools

Site Tools


lecture_notes:03-30-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:03-30-2011 [2011/03/31 20:56]
svohr [Ion Torrent]
lecture_notes:03-30-2011 [2011/04/01 12:20] (current)
svohr [Coverage] slight corrections
Line 32: Line 32:
   * High error rates (~5%)   * High error rates (~5%)
   * Useful when mapping to a reference.   * Useful when mapping to a reference.
 +===== Coverage =====
 +We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''​i''​ in a single read ''​j''​.
 +  ​
 +  P( seeing base i in read j ) = L/G
 +
 +where ''​L''​ is the read length and ''​G''​ is the total size of the genome. If we have ''​R''​ reads, then 
 +
 +  P( never seeing base i ) = (1 - L/G)^R
 +
 +We can multiply ''​L/​G''​ by ''​R/​R''​ to get ''​((L*R) / G) / R''​ or ''​C / R''​ where ''​C''​ is our coverage of the genome. We take the limit of this as
 +''​R''​ goes to infinity:
 +
 +  lim n->inf (1 - C/R)^R = e^-C
 +
 +Thus we can expect to miss ''​G*e^-C''​ bases.
 +
 +We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later
 +assemble them into //​scaffolds//​ using other information,​ such as long distance physical maps.
 +
 +
 +
 ===== References ===== ===== References =====
 <​refnotes>​notes-separator:​ none</​refnotes>​ <​refnotes>​notes-separator:​ none</​refnotes>​
 ~~REFNOTES cite~~ ~~REFNOTES cite~~
lecture_notes/03-30-2011.1301630162.txt.gz ยท Last modified: 2011/03/31 20:56 by svohr