Differences

This shows you the differences between two versions of the page.

--- lecture_notes:03-30-2011 [2011/04/01 03:56]
svohr [Ion Torrent]
+++ lecture_notes:03-30-2011 [2011/04/01 19:20] (current)
svohr [Coverage] slight corrections
@@ Line 32: / Line 32: @@
   * High error rates (~5%)
   * Useful when mapping to a reference.
+===== Coverage =====
+We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''i'' in a single read ''j''.
+  P( seeing base i in read j ) = L/G
+where ''L'' is the read length and ''G'' is the total size of the genome. If we have ''R'' reads, then
+  P( never seeing base i ) = (1 - L/G)^R
+We can multiply ''L/G'' by ''R/R'' to get ''((L*R) / G) / R'' or ''C / R'' where ''C'' is our coverage of the genome. We take the limit of this as
+''R'' goes to infinity:
+  lim n->inf (1 - C/R)^R = e^-C
+Thus we can expect to miss ''G*e^-C'' bases.
+We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later
+assemble them into //scaffolds// using other information, such as long distance physical maps.
 ===== References =====
 <refnotes>notes-separator: none</refnotes>
 ~~REFNOTES cite~~

Banana Slug Genomics