User Tools

Site Tools


lecture_notes:04-08-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
lecture_notes:04-08-2015 [2015/04/09 23:05]
sihussai
lecture_notes:04-08-2015 [2015/04/17 21:53]
sihussai fixing formatting
Line 1: Line 1:
-=====De novo Assembly II=====+======De novo Assembly II======
 **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ​ **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ​
  
-====Illumina Paired-end Sequencing Libraries====+=====Illumina Paired-end Sequencing Libraries=====
   * MiSeq has 300 bp reads   * MiSeq has 300 bp reads
   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)
Line 14: Line 14:
     * not enough info for scaffolding     * not enough info for scaffolding
  
-====Illumina Mate-Pair Sequencing Libraries===+=====Illumina Mate-Pair Sequencing Libraries====
   * Idea: get paired reads that are much farther away (for more scaffolding info)   * Idea: get paired reads that are much farther away (for more scaffolding info)
   * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way.    * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way. 
Line 34: Line 34:
  
  
-====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries====+=====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries=====
   * Uncommon and expensive, but the gold standard ​   * Uncommon and expensive, but the gold standard ​
   * Bacterial F-plasmid takes< 40 kb insert size   * Bacterial F-plasmid takes< 40 kb insert size
Line 40: Line 40:
   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif
  
-====Read Quality Assessment====+=====Read Quality Assessment=====
   * Base quality: Phred scores reported by sequencer. ​   * Base quality: Phred scores reported by sequencer. ​
   * Fastq files: fasta files, plus encoded phred scores   * Fastq files: fasta files, plus encoded phred scores
Line 49: Line 49:
   * Pacific Bio doesn’t have GC|AT bias   * Pacific Bio doesn’t have GC|AT bias
  
-===Tools===+====Tools====
   * FastQC (Most popular tool to tell you about the read library) ​   * FastQC (Most popular tool to tell you about the read library) ​
     * FastQC reported an issue with our data with kmer count (related to adapter content)     * FastQC reported an issue with our data with kmer count (related to adapter content)
Line 56: Line 56:
     * Estimates how difficult the assembly will be     * Estimates how difficult the assembly will be
  
-====Estimating Genome Size from Read Data====+=====Estimating Genome Size from Read Data=====
  G = (pn(1-k+1))/​(λ_k)  G = (pn(1-k+1))/​(λ_k)
  G = Genome size  G = Genome size
Line 66: Line 66:
   * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution   * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution
  
-====Error Correction====+=====Error Correction=====
   * High amount of small kmers are usually errors   * High amount of small kmers are usually errors
  
 **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50**  **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50** 
lecture_notes/04-08-2015.txt · Last modified: 2015/04/17 22:34 by sihussai