User Tools

Site Tools


lecture_notes:03-30-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
lecture_notes:03-30-2011 [2011/03/30 16:05]
eyliaw
lecture_notes:03-30-2011 [2011/04/01 12:07]
svohr Added notes from coverage discussion.
Line 5: Line 5:
  
 Inputs: ​ Sequencing data from various machines. ​ Some of the characteristics of these machines/​techniques:​ Inputs: ​ Sequencing data from various machines. ​ Some of the characteristics of these machines/​techniques:​
-===== Sanger capillary =====+==== Sanger capillary ==== 
 +  * ~800bp reads[(cite:​wikisanger>​http://​en.wikipedia.org/​wiki/​Microfluidic_Sanger_sequencing)]. 
 +  * Q (quality value) ~30 
 +  * ~$1/read, expensive because primers must be attached to each read. 
 +==== 454 ==== 
 +  * ~400bp reads[(cite:​wiki454>​http://​en.wikipedia.org/​wiki/​454_Life_Sciences)]. 
 +  * Pyrosequencing 
 +  * Q ~20 
 +  * $5000/​run/​1M reads, no downscaling (numbers approximate). 
 +==== SoLiD ==== 
 +  * 2x25bp or 1x50bp reads 
 +  * Paired end reads: ​ ligation with adapter, cleaves 25bp from adapter using restriction enzyme. 
 +  * Potential for double ligation: two unrelated sequences ligating. 
 +  * $2000/​run/​100M reads (numbers approximate). 
 +==== Illumina ==== 
 +  * 2x50, 2x100bps ? 
 +  * Paired end reads 
 +  * Potential errors: innies (ligated region not between sequenced regions) or chimeric (sequence passes ligated region) 
 +  * Cheaper than SoLiD, 10K Genomes project uses it. 
 +==== Ion Torrent ==== 
 +  * 2x100 base pairs 
 +  * ~50,000 to 5,000,000 reads depending on Sequencing Chip [(cite:​ionTorrent>​http://​www.iontorrent.com/​technology-how-does-it-perform/​)]. 
 +  * Ion semiconductor sequencing. No optics or modified bases are required. 
 +==== Pac Bio ==== 
 +  * Very long, single molecule reads (~10K) 
 +  * High error rates (~5%) 
 +  * Useful when mapping to a reference. 
 +===== Coverage ===== 
 +We briefly discussed how much sequence data would be required to assemble the genome. First, we considered the probability of seeing every base 
 +in the genome 
 +   
 +  P( seeing base i in read j ) L/G
  
-454 +where ''​L''​ is the read length and ''​G''​ is the total size of the genome. If we have ''​R''​ reads, then  
-SoLiD + 
-Illumina +  P( never seeing base i ) = (1 - L/G)^R 
-Ion Torrent + 
-Pac Bio+We can multiple ''​L/​G''​ by ''​R/​R''​ to get ''​((L*R) / G) / R''​ or ''​C / R''​ where ''​C''​ is our coverage of the genome. We take the limit of this as 
 +''​R''​ goes to infinity: 
 + 
 +  lim n->inf (1 - C/R)^R = e^-C 
 + 
 +Thus we can expect to miss G*e^-C bases. 
 + 
 +We cannot assemble an entire chromosome if we are missing bases. However, we can construct contiguous stretches of bases or //contigs// and later 
 +assemble them into //​scaffolds//​ using other information,​ such as long distance physical maps. 
 + 
 + 
 + 
 +===== References ===== 
 +<​refnotes>​notes-separator:​ none</​refnotes>​ 
 +~~REFNOTES cite~~
lecture_notes/03-30-2011.txt · Last modified: 2011/04/01 12:20 by svohr