User Tools

Site Tools


lecture_notes:04-06-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-06-2015 [2015/04/10 04:59]
gepoliano
lecture_notes:04-06-2015 [2015/04/10 05:55]
gepoliano
Line 1: Line 1:
 +
 +----
  ​Lecture Notes 4/6/2015  ​Lecture Notes 4/6/2015
  
Line 82: Line 84:
 Note Taker: Gepoliano Chaves Note Taker: Gepoliano Chaves
  
-LECTURE: ROADMAP TO THE DE NOVO ASSEMBLY OF THE BANANA SLUG GENOME+LECTURE: ​**ROADMAP TO THE DE NOVO ASSEMBLY OF THE BANANA SLUG GENOME**
  
  
Line 89: Line 91:
 Lecturer contact: stefan.prost@berkeley.edu Lecturer contact: stefan.prost@berkeley.edu
  
-OVERVIEW TOPICS+**OVERVIEW TOPICS**
 • A priori information about the genome • A priori information about the genome
 • Sequencing strategies and platforms • Sequencing strategies and platforms
Line 103: Line 105:
 There are two approaches to assembly genomic reads, de novo genome assembly and reference-based assembly. There are two approaches to assembly genomic reads, de novo genome assembly and reference-based assembly.
  
-De novo Assembly+**De novo Assembly**
  
 No previous genome to map the sequencing reads with No previous genome to map the sequencing reads with
Line 113: Line 115:
 N50 – is a king of median of the contigs length N50 – is a king of median of the contigs length
  
-Reference-based Assembly+**Reference-based Assembly**
  
 Kmer = short, unique element of DNA sequence of length n Kmer = short, unique element of DNA sequence of length n
Line 121: Line 123:
  
 =====  ===== 
-GENERAL INFORMATION ABOUT GENOME ASSEMBLY =====+**GENERAL INFORMATION ABOUT GENOME ASSEMBLY** =====
  
  
 As of a start point, 4 topics should be in our minds for the assembly: As of a start point, 4 topics should be in our minds for the assembly:
  
-Expected Genome Size (there is previous data for the slugs) +  * Expected Genome Size (there is previous data for the slugs) 
-Expected repeat content +  ​* ​Expected repeat content 
-Expected heterozygosity  +  ​* ​Expected heterozygosity  
-Haploid, Diploid or polyploidy (this represents a serious problem) – as I understood there’s no information about that for the banana slug. +  ​* ​Haploid, Diploid or polyploidy (this represents a serious problem) – as I understood there’s no information about that for the banana slug. 
- Cariotype information – can we derive that from the assembly? +  
-        C-Value = weight of genome (picogram) 1pg =1GB long +**Information from other genomes**
-        c-value from www.genomesize.com/​ +
-        ​ +
-Information from other genomes+
  
- The longfish has the largest vertebrate genome +  * The longfish has the largest vertebrate genome 
- big genomes – repetitions:​ genome size and repeat content, correlate positively +  * Big genomes – repetitions:​ genome size and repeat content, correlate positively 
-        Drosophila has a repetition content of 2%. +  ​* ​Drosophila has a repetition content of 2%. 
-        Mamallian genomes are trickier than bird’s genomes +  ​* ​Mamallian genomes are trickier than bird’s genomes 
-        Genome Synteny (Poelstra 2014) +  ​* ​Genome Synteny (Poelstra 2014) 
-        Synteny – conservation of blocks of order within two sets of chromosomes that are being compared with each other (http://​en.wikipedia.org/​wiki/​Synteny) +  ​* ​Synteny – conservation of blocks of order within two sets of chromosomes that are being compared ​ 
- Mammals’ genomes present rearrangements of sequences +  * with each other (http://​en.wikipedia.org/​wiki/​Synteny) 
- RNA-Seq in this regard does not show where the gene was.+  ​* ​Mammals’ genomes present rearrangements of sequences 
 +  ​* ​RNA-Seq in this regard does not show where the gene was.
  
-SEQUENCING TECHNOLOGIES 
  
-First generation +**SEQUENCING TECHNOLOGIES**
- Sanger Sequencing, based on the dideoxynucleotide chain termination. Dideoxynucleotides are chain-elongation inhibitors of DNA polymerase (Good accuracy and read length) +
-Second Generation (PCR needed) +
- Illumina’s Miseq and HiSeq are cheap platfroms to sequence DNA. Slugs’ data was collected using Illumina’s platform +
- Roche 454, expensive (slugs have some data originally sequenced in this platform) +
- LIFE sciences IONtorrent and IONproton (cheaper than 454) +
- ABI: SOLiD (hybridization array approach)( slugs have some data originally sequenced in this platform too)+
  
-Third Generation (Single Molecule Sequencing+**First generation** 
-Most commonly used platforms:  + 
- Helicos Biosciences:​ Heliscope +Sanger Sequencing, based on the dideoxynucleotide chain termination is considered to be the first generation sequencing technology. Dideoxynucleotides are chain-elongation inhibitors of DNA polymerase. Positive features of this technology is its good accuracy and read length. 
- Pacific Biosciences : PacBio RS II (PacBio ​is useless, no ATGC bias, problems with scafolding) + 
- Microbial genome -  +**Second ​Generation** (PCR needed
- Oxford Nanopore: MinION & GridION ​(error rate ~ 15%+ 
-        ​Illumina'​technology may be used to error-correct PacBio reads, but Illumina has GC bias, being computationally expensive.+Illumina’s MiSeq and HiSeq are cheap platfroms to sequence DNA. Slugs’ data was collected using Illumina’s platform for this course. 
 +Roche 454 is an expensive platformhowever slugs have some data originally sequenced on it. 
 +LIFE sciences manufactured IONtorrent and IONproton ​(cheaper than 454). 
 +ABI’SOLiD (hybridization array approach) system was also used sequencing slugs’ genome.
  
-ILLUMINA SEQUENCING 
- Different 3’ and 5’ end adapters – fragments are flanked by the adapters 
- Hybridization in the flowcell (array) 
- Bridge amplification – proximity and PCR amplification allows the fragment to be amplified. 
- Metzker 2010 
- A washing step takes out one of the two types 
- Same cluster: same sequence, sequencing primer 
- Four nucleotides are labeled, all 4 in the same reaction, different 4 calors for each nucleotide 
- Incorporation by polymerase – light release with colors 
- Same clusters – signal 
- CDC camera catches the color. 
- The process continues until ~100 bp 
- $1000 for a flowcell MinION – 400bp 1 lane  
   
 +**Third Generation** (Single Molecule Sequencing)
 +
 +Most commonly used platforms:
 +Helicos Biosciences:​ Heliscope
 +Pacific Biosciences : PacBio RS II (PacBio is useless, no AT, GC bias, problems with scafolding)
 +Microbial genome - 
 +Oxford Nanopore: MinION & GridION (error rate ~ 15%)
 +Illumina may be used to error-correct PacBio reads, but Illumna has GC bias, computationally expensive
 +
 +
 +//ILLUMINA SEQUENCING//​
 +
 +Different 3’ and 5’ end adapters – fragments are flanked by the adapters
 +Hybridization in the flowcell (array)
 +Bridge amplification – proximity and PCR amplification allows the fragment to be amplified.
 +Metzker 2010
 +A washing step takes out one of the two types
 +Same cluster: same sequence, sequencing primer
 +Four nucleotides are labeled, all 4 in the same reaction, different 4 colors for each nucleotide
 +Incorporation by polymerase – light release with colors.
 +Same clusters – signal.
 +CDC camera catches the color.
 +The process continues until ~100 bp
 +$1000 for a flowcell MinION – 400bp 1 lane 
  
  
 +//PACBIO//
  
 +Imagine a plate with small wells. In this technology, the objective is to make a polymerase stick to the well (Eid et al.,  2009). This can be imagined as something like ELISA but with only one exactly DNA and polymerase per well. PACBIO is a Real Time sequencing technology, allowing the time it takes to incorporate the base, a measure to infer information about heterochromatin and quadruplex structures. This technology allows long reads. Howver indels are the main mismatch that happens in PacBio.
  
-PACBIO 
- Imagine a plate with small wells 
- in this technology, the objective is to make a polymerase stick to the well 
- Eid et al 2009 
- Single molecule PCR polymerase that is fast enough 
- This can be imagined as something like ELISA but with only one exactly DNA and polymerase per well 
- Real time sequencing technology, allowing the time it takes to incorporate the base, a measure to infer information about heterochromatin and quadruplex structures 
- This technology allows long reads 
- INDEL in the main mismatch that happens in PacBio 
  
 +//OXFORD NANOPORE//
  
-OXFORD NANOPORE +This is a technology that has been around for 15 years now. It involves ​a membrane with a nanopore, formed with a protein called alfa-hemolysinDNA molecule goes throught the pore using a salt gradient concentration ​that allows the guidance of a single stranded DNA moleculethe guidance ​of DNA to the pore leads the DNA molecule to the exonuclease activity coupled to the hemolysin ​in the nanopore. ​Nucleotide is cut out by the exonuclease ​and the charge ​changes in a side of the membrane surfaceBased on nucleotide charge change, the unique nucleotide that matches that change is inferred to be in the sequence.
-technology that has been 15 years around +
- Involves ​a membrane with a nanopore, formed with a protein called alfa-hemolysin +
- DNA molecule goes throught the pore +
- Salt ​gradient concentration allows the guidance of a single stranded DNA molecule ​through ​the pore +
- Guidance ​of DNA to the pore leads the DNA molecule to the exonuclease activity coupled to the hemolysin +
- Nucleotide is cut and the carge changes in a side of the membrane surface +
- Based on nucleotide charge change, the unique nucleotide that matches that change is inferred to be in the sequence.+
  
lecture_notes/04-06-2015.txt · Last modified: 2015/04/10 05:55 by gepoliano