User Tools

Site Tools


lecture_notes:04-08-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
lecture_notes:04-08-2015 [2015/04/17 14:53]
sihussai fixing formatting
lecture_notes:04-08-2015 [2015/04/17 15:34] (current)
sihussai fixing capitalization
Line 1: Line 1:
-======De novo Assembly ​II======+======De novo assembly ​II======
 **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ​ **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ​
  
-=====Illumina ​Paired-end Sequencing Libraries=====+=====Illumina ​paired-end sequencing libraries=====
   * MiSeq has 300 bp reads   * MiSeq has 300 bp reads
   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)
Line 14: Line 14:
     * not enough info for scaffolding     * not enough info for scaffolding
  
-=====Illumina ​Mate-Pair Sequencing Libraries====+=====Illumina ​mate-pair sequencing libraries====
   * Idea: get paired reads that are much farther away (for more scaffolding info)   * Idea: get paired reads that are much farther away (for more scaffolding info)
   * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way.    * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way. 
Line 34: Line 34:
  
  
-=====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries=====+=====BAC (Bacterial Artificial Chromosome) and fosmid libraries=====
   * Uncommon and expensive, but the gold standard ​   * Uncommon and expensive, but the gold standard ​
   * Bacterial F-plasmid takes< 40 kb insert size   * Bacterial F-plasmid takes< 40 kb insert size
Line 40: Line 40:
   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif
  
-=====Read ​Quality Assessment=====+=====Read ​quality assessment=====
   * Base quality: Phred scores reported by sequencer. ​   * Base quality: Phred scores reported by sequencer. ​
   * Fastq files: fasta files, plus encoded phred scores   * Fastq files: fasta files, plus encoded phred scores
Line 56: Line 56:
     * Estimates how difficult the assembly will be     * Estimates how difficult the assembly will be
  
-=====Estimating ​Genome Size from Read Data=====+=====Estimating ​genome size from read data=====
  G = (pn(1-k+1))/​(λ_k)  G = (pn(1-k+1))/​(λ_k)
  G = Genome size  G = Genome size
Line 66: Line 66:
   * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution   * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution
  
-=====Error ​Correction=====+=====Error ​correction=====
   * High amount of small kmers are usually errors   * High amount of small kmers are usually errors
  
 **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50**  **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50** 
lecture_notes/04-08-2015.1429307582.txt.gz · Last modified: 2015/04/17 14:53 by sihussai