User Tools

Site Tools


lecture_notes:03-30-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:03-30-2011 [2011/03/30 23:20]
eyliaw [Sanger capillary]
lecture_notes:03-30-2011 [2011/04/01 19:20]
svohr [Coverage] slight corrections
Line 11: Line 11:
 ==== 454 ==== ==== 454 ====
   * ~400bp reads[(cite:​wiki454>​http://​en.wikipedia.org/​wiki/​454_Life_Sciences)].   * ~400bp reads[(cite:​wiki454>​http://​en.wikipedia.org/​wiki/​454_Life_Sciences)].
-  * Q ~20 
   * Pyrosequencing   * Pyrosequencing
-  * $5000/run/~1m reads, no downscaling.+  ​* Q ~20 
 +  ​* $5000/run/1M reads, no downscaling ​(numbers approximate).
 ==== SoLiD ==== ==== SoLiD ====
 +  * 2x25bp or 1x50bp reads 
 +  * Paired end reads: ​ ligation with adapter, cleaves 25bp from adapter using restriction enzyme. 
 +  * Potential for double ligation: two unrelated sequences ligating. 
 +  * $2000/​run/​100M reads (numbers approximate).
 ==== Illumina ==== ==== Illumina ====
 +  * 2x50, 2x100bps ? 
 +  * Paired end reads 
 +  * Potential errors: innies (ligated region not between sequenced regions) or chimeric (sequence passes ligated region) 
 +  * Cheaper than SoLiD, 10K Genomes project uses it.
 ==== Ion Torrent ==== ==== Ion Torrent ====
 +  * 2x100 base pairs 
 +  * ~50,000 to 5,000,000 reads depending on Sequencing Chip [(cite:​ionTorrent>​http://​www.iontorrent.com/​technology-how-does-it-perform/​)]. 
 +  * Ion semiconductor sequencing. No optics or modified bases are required.
 ==== Pac Bio ==== ==== Pac Bio ====
 +  * Very long, single molecule reads (~10K)
 +  * High error rates (~5%)
 +  * Useful when mapping to a reference.
 +===== Coverage =====
 +We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''​i''​ in a single read ''​j''​.
 +  ​
 +  P( seeing base i in read j ) = L/G
 +
 +where ''​L''​ is the read length and ''​G''​ is the total size of the genome. If we have ''​R''​ reads, then 
 +
 +  P( never seeing base i ) = (1 - L/G)^R
 +
 +We can multiply ''​L/​G''​ by ''​R/​R''​ to get ''​((L*R) / G) / R''​ or ''​C / R''​ where ''​C''​ is our coverage of the genome. We take the limit of this as
 +''​R''​ goes to infinity:
 +
 +  lim n->inf (1 - C/R)^R = e^-C
 +
 +Thus we can expect to miss ''​G*e^-C''​ bases.
 +
 +We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later
 +assemble them into //​scaffolds//​ using other information,​ such as long distance physical maps.
 +
 +
  
 ===== References ===== ===== References =====
 <​refnotes>​notes-separator:​ none</​refnotes>​ <​refnotes>​notes-separator:​ none</​refnotes>​
 ~~REFNOTES cite~~ ~~REFNOTES cite~~
lecture_notes/03-30-2011.txt · Last modified: 2011/04/01 19:20 by svohr