Chemistry of Sequencing Technology

Nader talked about the chemistry behind 4 Sequencing Platforms : SOLiD Bioanalyzer, 454, Illumina/Solexa, and charge based detection system Now sequencing.

Sequencing workflow : Genomic DNA → fragment → amplification → immobilization → Sequencing.
Time required for sequencing through these platforms is :
1-2 hrs for CBD.
9 hrs for 454.
3-7 days for Illumina/Solexa.
7-10 days for SOLiD.

The underlying technolgies for the platforms :

Sequencing by synthesis/Pyrosequencing
Involves synthesizing complementary strand of DNA to the template DNA strand to be sequenced. Four enzymes are used in this method - DNA polymerase, ATP sulfyrylase, Luciferase, and Apyrase. The light generated is captured as a signal which detects the base incorporated. Currently the read length is approximately 400 nucleotides. Incorporation of homopolymer A is found to be problematic.
Further description of how the 4 enzymes work in this process can be found herehttp://en.wikipedia.org/wiki/Pyrosequencing.

Sequencing by Ligation
ABI SOLiD sequencing technology follows sequencing by ligation. In first ligation cycle, possible 1024 oligonucleotide probes compete for incorporation, followed by dephosphorylation, visualization, and cleavage. Same procedure is followed for the next ligation cycles. Every 5th round of cycle, the process is reset. In the next cycle, primer length is reduced by one base (n-1) to get overlap, and again in the subsequent cycles the primer length is n-2, n-3 and so on to determine the DNA sequence due to overlap. For further reading follow http://sequencing.soe.ucsc.edu/node/20.

Mate-Pairing - The chances of damage to DNA increases with time. Hence, there is a high possibility of not getting the accurate sequence data from the other end of DNA. To avoid this, the Mate-Pairing process involves the determination of sequence of DNA from the end towards the beginning and then from the beginning towards the end.

Alternative Notes

High Level Overview

The high level sequencing workflow for all next gen tools is as follows: Fragment Sample → Amplify Fragments → Sequence Fragments

There are two main flavors of next generation sequencing technology:

Examination of Individual Technologies

There are four technologies that were talked about today.

Pyrosequencing (SBS)

When a nucleotide is added to the template chain by a polymerase, a PPi is released which is converted to an ATP by one enzyme. In the presence of ATP another enzyme, luciferase, releases light. You record when the light was released (ie which nucleotide was added to the plate at that time) and also the intensity of the light. The light intensity tells you how many nucleotides were added in a row.

After each nucleotide is added, you must wash away the nucleotides and start fresh before adding the next nucleotide and recording which positions on the plate light up in that case.

There is a problem with multiple A's because of the modified nucleotides which are needed for this reaction. FIXME

You must get the ammount of template on the beads just right. Must be all the same type. If too much template is present the read length is shortened (not sure why). If too few template is on the beads then the washing step will take away a significant amount of template and you won't have enough of a signal.

The packing beads that are placed around the bead which holds the template strands in the individual wells on the plate are what hold the various enzymes (other than the polymerase) needed for the reaction.

solid (Hybridization)

4 colors and 1024 possible 8mer fragments.

Each round gives the nucleotide identity at the first two positions of the 8mer fragment with a 3 nucleotide gap between each two nucleotides. Ie one run gives you …—AB—CD—… You then perform multiple runs which start in the -1 position from the previous run so eventually when you combine all of these runs you get the entire fragment sequence.

Each of these images takes several hours, so a whole run may take several days.

Illumina/solexa (SBS)

Rather than running emulsion PCR on beads to amplify a fragment into a colony of identical fragments on a single bead, you wash fragments over a slide and hope they stick to the slide far apart from eachother. You can then amplify these fragments on the slide which forms the dense single sequence identity colony you need to get a good read.

Rather than sequencing single nucleotides at a time (and having the possibility for a series of single nucleotides to be sequenced at a time) you sequence all nucleotides (ACGT) simultaneously and have each nucleotide type give off a different color. You then get the identity of one nucleotide from the entire plate, and move on to the next nucleotide on the whole plate.

Theoretically you could sequence very long fragments this way, but practically the quality starts degrading too much after about 50 bases.

Ion Torrent (SBS)

Works very similarly to pyrosequencing, except rather than requiring light from luciferase enzymes it takes advantage of the proton that is released when a nucleotide is added to a template. It measures this electric current. The presense of a current over a certain template when a specific nucleotide is added to solution tells you that that nucleotide was added to that template. The level of current at a template tells you the number of nucleotides that were added in that run.

Can sequence about 60 bases every 100 minutes.

Nader recommended reading the 2006 PNAS paper for background on this technology.

Ideas for data storage layout in Campusrocks

Everything is stored in:

/campusdata/BME235/

Kevin proposed having the following folders.

bin/

bin/scripts/

bin/x86_64

bin/[picky_program_directories] (store programs that require the proximity of helper scripts)

data/

programs/ (store program source and test programs before adding them to bin)

lib/

experiments/ (store trial runs while testing out settings etc)

There should be a README in each directory describing the downstream directories, the purpose, and the contents.

Kevin stressed that Makefiles are a good way of generating reproducible, and fairly complex runs. Anyone who needs more background on makefiles should see the gnu make documentation which is freely available online. Also the various subfolders of ~karplus/pluck/ have examples of makefiles Kevin has used for his work with assembling genomes.

Also we meet in the science library (instructions room near periodicals) this Friday.