User Tools

Site Tools


This is an old revision of the document!

Brief overview of goals and data input characteristics

Kevin laid out some of the logistics of the class.

A broad goal: For each chromosome in the slug, we want the full sequence in DNA bases. Since it is unlikely to be completed in the timeframe of one quarter, some smaller goals: build contigs and have an idea of the scaffold to arrange the contigs in.

Inputs: Sequencing data from various machines. Some of the characteristics of these machines/techniques:

Sanger capillary

  • ~800bp reads[1].
  • Q (quality value) ~30
  • ~$1/read, expensive because primers must be attached to each read.


  • ~400bp reads[2].
  • Pyrosequencing
  • Q ~20
  • $5000/run/1M reads, no downscaling (numbers approximate).


  • 2x25bp or 1x50bp reads
  • Paired end reads: ligation with adapter, cleaves 25bp from adapter using restriction enzyme.
  • Potential for double ligation: two unrelated sequences ligating.
  • $2000/run/100M reads (numbers approximate).


  • 2×50, 2x100bps ?
  • Paired end reads
  • Potential errors: innies (ligated region not between sequenced regions) or chimeric (sequence passes ligated region)
  • Cheaper than SoLiD, 10K Genomes project uses it.

Ion Torrent

  • 2×100 base pairs
  • ~50,000 to 5,000,000 reads depending on Sequencing Chip [3].
  • Ion semiconductor sequencing. No optics or modified bases are required.

Pac Bio

  • Very long, single molecule reads (~10K)
  • High error rates (~5%)
  • Useful when mapping to a reference.


You could leave a comment if you were logged in.
lecture_notes/03-30-2011.1301630162.txt.gz · Last modified: 2011/03/31 20:56 by svohr