This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lecture_notes:03-30-2011 [2011/04/01 03:56] svohr [Ion Torrent] |
lecture_notes:03-30-2011 [2011/04/01 19:20] (current) svohr [Coverage] slight corrections |
||
---|---|---|---|
Line 32: | Line 32: | ||
* High error rates (~5%) | * High error rates (~5%) | ||
* Useful when mapping to a reference. | * Useful when mapping to a reference. | ||
+ | ===== Coverage ===== | ||
+ | We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''i'' in a single read ''j''. | ||
+ | | ||
+ | P( seeing base i in read j ) = L/G | ||
+ | |||
+ | where ''L'' is the read length and ''G'' is the total size of the genome. If we have ''R'' reads, then | ||
+ | |||
+ | P( never seeing base i ) = (1 - L/G)^R | ||
+ | |||
+ | We can multiply ''L/G'' by ''R/R'' to get ''((L*R) / G) / R'' or ''C / R'' where ''C'' is our coverage of the genome. We take the limit of this as | ||
+ | ''R'' goes to infinity: | ||
+ | |||
+ | lim n->inf (1 - C/R)^R = e^-C | ||
+ | |||
+ | Thus we can expect to miss ''G*e^-C'' bases. | ||
+ | |||
+ | We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later | ||
+ | assemble them into //scaffolds// using other information, such as long distance physical maps. | ||
+ | |||
+ | |||
+ | |||
===== References ===== | ===== References ===== | ||
<refnotes>notes-separator: none</refnotes> | <refnotes>notes-separator: none</refnotes> | ||
~~REFNOTES cite~~ | ~~REFNOTES cite~~ |