User Tools

Site Tools


lecture_notes:04-06-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
lecture_notes:04-06-2015 [2015/04/10 04:59]
gepoliano
lecture_notes:04-06-2015 [2015/04/10 05:44]
gepoliano
Line 1: Line 1:
 +
 +----
  ​Lecture Notes 4/6/2015  ​Lecture Notes 4/6/2015
  
Line 82: Line 84:
 Note Taker: Gepoliano Chaves Note Taker: Gepoliano Chaves
  
-LECTURE: ROADMAP TO THE DE NOVO ASSEMBLY OF THE BANANA SLUG GENOME+LECTURE: ​**ROADMAP TO THE DE NOVO ASSEMBLY OF THE BANANA SLUG GENOME**
  
  
Line 89: Line 91:
 Lecturer contact: stefan.prost@berkeley.edu Lecturer contact: stefan.prost@berkeley.edu
  
-OVERVIEW TOPICS+**OVERVIEW TOPICS**
 • A priori information about the genome • A priori information about the genome
 • Sequencing strategies and platforms • Sequencing strategies and platforms
Line 103: Line 105:
 There are two approaches to assembly genomic reads, de novo genome assembly and reference-based assembly. There are two approaches to assembly genomic reads, de novo genome assembly and reference-based assembly.
  
-De novo Assembly+**De novo Assembly**
  
 No previous genome to map the sequencing reads with No previous genome to map the sequencing reads with
Line 113: Line 115:
 N50 – is a king of median of the contigs length N50 – is a king of median of the contigs length
  
-Reference-based Assembly+**Reference-based Assembly**
  
 Kmer = short, unique element of DNA sequence of length n Kmer = short, unique element of DNA sequence of length n
Line 121: Line 123:
  
 =====  ===== 
-GENERAL INFORMATION ABOUT GENOME ASSEMBLY =====+**GENERAL INFORMATION ABOUT GENOME ASSEMBLY** =====
  
  
 As of a start point, 4 topics should be in our minds for the assembly: As of a start point, 4 topics should be in our minds for the assembly:
  
-Expected Genome Size (there is previous data for the slugs) +  * Expected Genome Size (there is previous data for the slugs) 
-Expected repeat content +  ​* ​Expected repeat content 
-Expected heterozygosity  +  ​* ​Expected heterozygosity  
-Haploid, Diploid or polyploidy (this represents a serious problem) – as I understood there’s no information about that for the banana slug. +  ​* ​Haploid, Diploid or polyploidy (this represents a serious problem) – as I understood there’s no information about that for the banana slug. 
- Cariotype information – can we derive that from the assembly? +  
-        C-Value = weight of genome (picogram) 1pg =1GB long +**Information from other genomes**
-        c-value from www.genomesize.com/​ +
-        ​ +
-Information from other genomes+
  
- The longfish has the largest vertebrate genome +  * The longfish has the largest vertebrate genome 
- big genomes – repetitions:​ genome size and repeat content, correlate positively +  * Big genomes – repetitions:​ genome size and repeat content, correlate positively 
-        Drosophila has a repetition content of 2%. +  ​* ​Drosophila has a repetition content of 2%. 
-        Mamallian genomes are trickier than bird’s genomes +  ​* ​Mamallian genomes are trickier than bird’s genomes 
-        Genome Synteny (Poelstra 2014) +  ​* ​Genome Synteny (Poelstra 2014) 
-        Synteny – conservation of blocks of order within two sets of chromosomes that are being compared with each other (http://​en.wikipedia.org/​wiki/​Synteny) +  ​* ​Synteny – conservation of blocks of order within two sets of chromosomes that are being compared ​ 
- Mammals’ genomes present rearrangements of sequences +  * with each other (http://​en.wikipedia.org/​wiki/​Synteny) 
- RNA-Seq in this regard does not show where the gene was.+  ​* ​Mammals’ genomes present rearrangements of sequences 
 +  ​* ​RNA-Seq in this regard does not show where the gene was.
  
-SEQUENCING TECHNOLOGIES 
  
-First generation +**SEQUENCING TECHNOLOGIES**
- Sanger Sequencing, based on the dideoxynucleotide chain termination. Dideoxynucleotides are chain-elongation inhibitors of DNA polymerase (Good accuracy and read length) +
-Second Generation (PCR needed) +
- Illumina’s Miseq and HiSeq are cheap platfroms to sequence DNA. Slugs’ data was collected using Illumina’s platform +
- Roche 454, expensive (slugs have some data originally sequenced in this platform) +
- LIFE sciences IONtorrent and IONproton (cheaper than 454) +
- ABI: SOLiD (hybridization array approach)( slugs have some data originally sequenced in this platform too)+
  
-Third Generation (Single Molecule Sequencing+**First generation** 
-Most commonly used platforms:  + 
- Helicos Biosciences:​ Heliscope +Sanger Sequencing, based on the dideoxynucleotide chain termination is considered to be the first generation sequencing technology. Dideoxynucleotides are chain-elongation inhibitors of DNA polymerase. Positive features of this technology is its good accuracy and read length. 
- Pacific Biosciences : PacBio RS II (PacBio ​is useless, no ATGC bias, problems with scafolding) + 
- Microbial genome -  +**Second ​Generation** (PCR needed
- Oxford Nanopore: MinION & GridION ​(error rate ~ 15%+ 
-        ​Illumina'​technology may be used to error-correct PacBio reads, but Illumina has GC bias, being computationally expensive.+Illumina’s MiSeq and HiSeq are cheap platfroms to sequence DNA. Slugs’ data was collected using Illumina’s platform for this course. 
 +Roche 454 is an expensive platformhowever slugs have some data originally sequenced on it. 
 +LIFE sciences manufactured IONtorrent and IONproton ​(cheaper than 454). 
 +ABI’SOLiD (hybridization array approach) system was also used sequencing slugs’ genome.
  
-ILLUMINA SEQUENCING 
- Different 3’ and 5’ end adapters – fragments are flanked by the adapters 
- Hybridization in the flowcell (array) 
- Bridge amplification – proximity and PCR amplification allows the fragment to be amplified. 
- Metzker 2010 
- A washing step takes out one of the two types 
- Same cluster: same sequence, sequencing primer 
- Four nucleotides are labeled, all 4 in the same reaction, different 4 calors for each nucleotide 
- Incorporation by polymerase – light release with colors 
- Same clusters – signal 
- CDC camera catches the color. 
- The process continues until ~100 bp 
- $1000 for a flowcell MinION – 400bp 1 lane  
   
 +**Third Generation** (Single Molecule Sequencing)
  
 +Most commonly used platforms:
 +Helicos Biosciences:​ Heliscope
 +Pacific Biosciences : PacBio RS II (PacBio is useless, no AT, GC bias, problems with scafolding)
 +Microbial genome - 
 +Oxford Nanopore: MinION & GridION (error rate ~ 15%)
 +Illumina may be used to error-correct PacBio reads, but Illumna has GC bias, computationally expensive
  
  
 +//ILLUMINA SEQUENCING//​
 +Different 3’ and 5’ end adapters – fragments are flanked by the adapters
 +Hybridization in the flowcell (array)
 +Bridge amplification – proximity and PCR amplification allows the fragment to be amplified.
 +Metzker 2010
 +A washing step takes out one of the two types
 +Same cluster: same sequence, sequencing primer
 +Four nucleotides are labeled, all 4 in the same reaction, different 4 calors for each nucleotide
 +Incorporation by polymerase – light release with colors
 +Same clusters – signal
 +CDC camera catches the color.
 +The process continues until ~100 bp
 +$1000 for a flowcell MinION – 400bp 1 lane 
 +
 +
 +//PACBIO//
 +Imagine a plate with small wells. In this technology, the objective is to make a polymerase stick to the well (Eid et al.,  2009). This can be imagined as something like ELISA but with only one exactly DNA and polymerase per well. PACBIO is a Real Time sequencing technology, allowing the time it takes to incorporate the base, a measure to infer information about heterochromatin and quadruplex structures. This technology allows long reads. Howver indels are the main mismatch that happens in PacBio.
  
-PACBIO 
- Imagine a plate with small wells 
- in this technology, the objective is to make a polymerase stick to the well 
- Eid et al 2009 
- Single molecule PCR polymerase that is fast enough 
- This can be imagined as something like ELISA but with only one exactly DNA and polymerase per well 
- Real time sequencing technology, allowing the time it takes to incorporate the base, a measure to infer information about heterochromatin and quadruplex structures 
- This technology allows long reads 
- INDEL in the main mismatch that happens in PacBio 
  
 +//OXFORD NANOPORE//
  
-OXFORD NANOPORE +This is a technology that has been around for 15 years now. It involves ​a membrane with a nanopore, formed with a protein called alfa-hemolysinDNA molecule goes throught the pore using a salt gradient concentration ​that allows the guidance of a single stranded DNA moleculethe guidance ​of DNA to the pore leads the DNA molecule to the exonuclease activity coupled to the hemolysin ​in the nanopore. ​Nucleotide is cut out by the exonuclease ​and the charge ​changes in a side of the membrane surfaceBased on nucleotide charge change, the unique nucleotide that matches that change is inferred to be in the sequence.
-technology that has been 15 years around +
- Involves ​a membrane with a nanopore, formed with a protein called alfa-hemolysin +
- DNA molecule goes throught the pore +
- Salt ​gradient concentration allows the guidance of a single stranded DNA molecule ​through ​the pore +
- Guidance ​of DNA to the pore leads the DNA molecule to the exonuclease activity coupled to the hemolysin +
- Nucleotide is cut and the carge changes in a side of the membrane surface +
- Based on nucleotide charge change, the unique nucleotide that matches that change is inferred to be in the sequence.+
  
lecture_notes/04-06-2015.txt · Last modified: 2015/04/10 05:55 by gepoliano