User Tools

Site Tools


lecture_notes:03-30-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:03-30-2011 [2011/03/30 16:16]
eyliaw
lecture_notes:03-30-2011 [2011/04/01 12:20] (current)
svohr [Coverage] slight corrections
Line 6: Line 6:
 Inputs: ​ Sequencing data from various machines. ​ Some of the characteristics of these machines/​techniques:​ Inputs: ​ Sequencing data from various machines. ​ Some of the characteristics of these machines/​techniques:​
 ==== Sanger capillary ==== ==== Sanger capillary ====
-~800bp reads[(cite:​sangerwiki>​http://​en.wikipedia.org/​wiki/​Microfluidic_Sanger_sequencing)]. +  * ~800bp reads[(cite:​wikisanger>​http://​en.wikipedia.org/​wiki/​Microfluidic_Sanger_sequencing)]. 
-Q (quality value) ~30 +  ​* ​Q (quality value) ~30 
-Expensive (~$1/readbecause primers must be attached to each read.+  ​* ​~$1/read, expensive ​because primers must be attached to each read.
 ==== 454 ==== ==== 454 ====
-~500bp reads[(cite:​454wiki>​http://​en.wikipedia.org/​wiki/​454_Life_Sciences)]. +  * ~400bp reads[(cite:​wiki454>​http://​en.wikipedia.org/​wiki/​454_Life_Sciences)]. 
-Pyrosequencing +  ​* ​Pyrosequencing 
-$5000/run/~1m reads+  * Q ~20 
 +  * $5000/run/1M reads, no downscaling (numbers approximate).
 ==== SoLiD ==== ==== SoLiD ====
 +  * 2x25bp or 1x50bp reads 
 +  * Paired end reads: ​ ligation with adapter, cleaves 25bp from adapter using restriction enzyme. 
 +  * Potential for double ligation: two unrelated sequences ligating. 
 +  * $2000/​run/​100M reads (numbers approximate).
 ==== Illumina ==== ==== Illumina ====
 +  * 2x50, 2x100bps ? 
 +  * Paired end reads 
 +  * Potential errors: innies (ligated region not between sequenced regions) or chimeric (sequence passes ligated region) 
 +  * Cheaper than SoLiD, 10K Genomes project uses it.
 ==== Ion Torrent ==== ==== Ion Torrent ====
 +  * 2x100 base pairs 
 +  * ~50,000 to 5,000,000 reads depending on Sequencing Chip [(cite:​ionTorrent>​http://​www.iontorrent.com/​technology-how-does-it-perform/​)]. 
 +  * Ion semiconductor sequencing. No optics or modified bases are required.
 ==== Pac Bio ==== ==== Pac Bio ====
 +  * Very long, single molecule reads (~10K)
 +  * High error rates (~5%)
 +  * Useful when mapping to a reference.
 +===== Coverage =====
 +We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing a particular base ''​i''​ in a single read ''​j''​.
 +  ​
 +  P( seeing base i in read j ) = L/G
 +
 +where ''​L''​ is the read length and ''​G''​ is the total size of the genome. If we have ''​R''​ reads, then 
 +
 +  P( never seeing base i ) = (1 - L/G)^R
 +
 +We can multiply ''​L/​G''​ by ''​R/​R''​ to get ''​((L*R) / G) / R''​ or ''​C / R''​ where ''​C''​ is our coverage of the genome. We take the limit of this as
 +''​R''​ goes to infinity:
 +
 +  lim n->inf (1 - C/R)^R = e^-C
 +
 +Thus we can expect to miss ''​G*e^-C''​ bases.
 +
 +We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later
 +assemble them into //​scaffolds//​ using other information,​ such as long distance physical maps.
 +
 +
  
 ===== References ===== ===== References =====
 <​refnotes>​notes-separator:​ none</​refnotes>​ <​refnotes>​notes-separator:​ none</​refnotes>​
 ~~REFNOTES cite~~ ~~REFNOTES cite~~
lecture_notes/03-30-2011.1301526965.txt.gz · Last modified: 2011/03/30 16:16 by eyliaw