This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lecture_notes:03-30-2011 [2011/03/30 23:20] eyliaw [Sanger capillary] |
lecture_notes:03-30-2011 [2011/04/01 19:20] (current) svohr [Coverage] slight corrections |
||
---|---|---|---|
Line 11: | Line 11: | ||
==== 454 ==== | ==== 454 ==== | ||
* ~400bp reads[(cite:wiki454>http://en.wikipedia.org/wiki/454_Life_Sciences)]. | * ~400bp reads[(cite:wiki454>http://en.wikipedia.org/wiki/454_Life_Sciences)]. | ||
- | * Q ~20 | ||
* Pyrosequencing | * Pyrosequencing | ||
- | * $5000/run/~1m reads, no downscaling. | + | * Q ~20 |
+ | * $5000/run/1M reads, no downscaling (numbers approximate). | ||
==== SoLiD ==== | ==== SoLiD ==== | ||
+ | * 2x25bp or 1x50bp reads | ||
+ | * Paired end reads: ligation with adapter, cleaves 25bp from adapter using restriction enzyme. | ||
+ | * Potential for double ligation: two unrelated sequences ligating. | ||
+ | * $2000/run/100M reads (numbers approximate). | ||
==== Illumina ==== | ==== Illumina ==== | ||
+ | * 2x50, 2x100bps ? | ||
+ | * Paired end reads | ||
+ | * Potential errors: innies (ligated region not between sequenced regions) or chimeric (sequence passes ligated region) | ||
+ | * Cheaper than SoLiD, 10K Genomes project uses it. | ||
==== Ion Torrent ==== | ==== Ion Torrent ==== | ||
+ | * 2x100 base pairs | ||
+ | * ~50,000 to 5,000,000 reads depending on Sequencing Chip [(cite:ionTorrent>http://www.iontorrent.com/technology-how-does-it-perform/)]. | ||
+ | * Ion semiconductor sequencing. No optics or modified bases are required. | ||
==== Pac Bio ==== | ==== Pac Bio ==== | ||
+ | * Very long, single molecule reads (~10K) | ||
+ | * High error rates (~5%) | ||
+ | * Useful when mapping to a reference. | ||
+ | ===== Coverage ===== | ||
+ | We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''i'' in a single read ''j''. | ||
+ | | ||
+ | P( seeing base i in read j ) = L/G | ||
+ | |||
+ | where ''L'' is the read length and ''G'' is the total size of the genome. If we have ''R'' reads, then | ||
+ | |||
+ | P( never seeing base i ) = (1 - L/G)^R | ||
+ | |||
+ | We can multiply ''L/G'' by ''R/R'' to get ''((L*R) / G) / R'' or ''C / R'' where ''C'' is our coverage of the genome. We take the limit of this as | ||
+ | ''R'' goes to infinity: | ||
+ | |||
+ | lim n->inf (1 - C/R)^R = e^-C | ||
+ | |||
+ | Thus we can expect to miss ''G*e^-C'' bases. | ||
+ | |||
+ | We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later | ||
+ | assemble them into //scaffolds// using other information, such as long distance physical maps. | ||
+ | |||
+ | |||
===== References ===== | ===== References ===== | ||
<refnotes>notes-separator: none</refnotes> | <refnotes>notes-separator: none</refnotes> | ||
~~REFNOTES cite~~ | ~~REFNOTES cite~~ |