This is an old revision of the document!
Brief overview of goals and data input characteristics
Kevin laid out some of the logistics of the class.
A broad goal: For each chromosome in the slug, we want the full sequence in DNA bases. Since it is unlikely to be completed in the timeframe of one quarter, some smaller goals: build contigs and have an idea of the scaffold to arrange the contigs in.
Inputs: Sequencing data from various machines. Some of the characteristics of these machines/techniques:
2x25bp or 1x50bp reads
Paired end reads: ligation with adapter, cleaves 25bp from adapter using restriction enzyme.
Potential for double ligation: two unrelated sequences ligating.
$2000/run/100M reads (numbers approximate).
2×50, 2x100bps ?
Paired end reads
Potential errors: innies (ligated region not between sequenced regions) or chimeric (sequence passes ligated region)
Cheaper than SoLiD, 10K Genomes project uses it.
2×100 base pairs
~50,000 to 5,000,000 reads depending on Sequencing Chip .
Ion semiconductor sequencing. No optics or modified bases are required.