User Tools

Site Tools


lecture_notes:04-08-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-08-2015 [2015/04/09 16:05]
sihussai
lecture_notes:04-08-2015 [2015/04/17 15:34] (current)
sihussai fixing capitalization
Line 1: Line 1:
-=====De novo Assembly ​II=====+======De novo assembly ​II======
 **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ​ **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ​
  
-====Illumina ​Paired-end Sequencing Libraries====+=====Illumina ​paired-end sequencing libraries=====
   * MiSeq has 300 bp reads   * MiSeq has 300 bp reads
   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)
Line 14: Line 14:
     * not enough info for scaffolding     * not enough info for scaffolding
  
-====Illumina ​Mate-Pair Sequencing Libraries===+=====Illumina ​mate-pair sequencing libraries====
   * Idea: get paired reads that are much farther away (for more scaffolding info)   * Idea: get paired reads that are much farther away (for more scaffolding info)
   * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way.    * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way. 
Line 34: Line 34:
  
  
-====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries====+=====BAC (Bacterial Artificial Chromosome) and fosmid libraries=====
   * Uncommon and expensive, but the gold standard ​   * Uncommon and expensive, but the gold standard ​
   * Bacterial F-plasmid takes< 40 kb insert size   * Bacterial F-plasmid takes< 40 kb insert size
Line 40: Line 40:
   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif
  
-====Read ​Quality Assessment====+=====Read ​quality assessment=====
   * Base quality: Phred scores reported by sequencer. ​   * Base quality: Phred scores reported by sequencer. ​
   * Fastq files: fasta files, plus encoded phred scores   * Fastq files: fasta files, plus encoded phred scores
Line 49: Line 49:
   * Pacific Bio doesn’t have GC|AT bias   * Pacific Bio doesn’t have GC|AT bias
  
-===Tools===+====Tools====
   * FastQC (Most popular tool to tell you about the read library) ​   * FastQC (Most popular tool to tell you about the read library) ​
     * FastQC reported an issue with our data with kmer count (related to adapter content)     * FastQC reported an issue with our data with kmer count (related to adapter content)
Line 56: Line 56:
     * Estimates how difficult the assembly will be     * Estimates how difficult the assembly will be
  
-====Estimating ​Genome Size from Read Data====+=====Estimating ​genome size from read data=====
  G = (pn(1-k+1))/​(λ_k)  G = (pn(1-k+1))/​(λ_k)
  G = Genome size  G = Genome size
Line 66: Line 66:
   * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution   * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution
  
-====Error ​Correction====+=====Error ​correction=====
   * High amount of small kmers are usually errors   * High amount of small kmers are usually errors
  
 **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50**  **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50** 
lecture_notes/04-08-2015.1428620756.txt.gz · Last modified: 2015/04/09 16:05 by sihussai