User Tools

Site Tools


lecture_notes:04-08-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
lecture_notes:04-08-2015 [2015/04/09 19:11]
jolespin created
lecture_notes:04-08-2015 [2015/04/09 22:42]
sihussai
Line 1: Line 1:
-De nova Assembly ​| Wed 8 April 2015 | Stefan Prost stefan.prost@berkley.edu ​| jolespin notes+=====De novo Assembly ​II===== 
 +**Guest lecturer: ​Stefan Proststefan.prost@berkley.edu** 
  
-#Illumina Paired-end Sequencing Libraries +====Illumina Paired-end Sequencing Libraries==== 
- MiSeq has 300 bp reads +  ​* ​MiSeq has 300 bp reads 
- Paired ends read from both directions+  ​* ​Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)
  
- =====> ​read_1+ =====> ​end_1
  ________________________  ________________________
-            read_2 ​<===== +            end_2 <=====
- . Can’t sequence repeat regions with paired-end reads+
  
-#Illumina Mate-Pair Sequencing Libraries +  * Can’t sequence repeat regions with paired-end reads 
- <​===== ​read_1+    * ends can't be very far apart because Illumina can't handle big molecules 
 +    * not enough info for scaffolding 
 + 
 +====Illumina Mate-Pair Sequencing Libraries=== 
 +  * Idea: get paired reads that are much farther away (for more scaffolding info) 
 +  * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way.  
 + 
 + <​===== ​end_1
  ________________________  ________________________
-            read_2 ​=====> +            end_2 =====> 
- . Dependent on inferring insert size + 
- Genomic DNA > Fragment (2-5 kb) > biotinylate ends > +  ​* ​Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > … 
-   ​Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > +  * Dependent on inferring insert size 
-          ​Ligate adaptors > …+
  
-#BAC (Bacterial Artificial Chromosome) and Fosmid Libraries +====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== 
- Bacterial F-plasmid takes< 40 kb insert size +  ​* ​Bacterial F-plasmid takes< 40 kb insert size 
- Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads +  ​* ​Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads 
- http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif+  ​* ​http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif
  
-#Read Quality Assessment Tools +====Read Quality Assessment Tools==== 
- FastQC (Most popular tool to tell you about the read library)  +  ​* ​FastQC (Most popular tool to tell you about the read library)  
- Preqc +  ​* ​Preqc 
- Reads decrease in quality further down the read +  ​* ​Reads decrease in quality further down the read 
- Pacific Bio doesn’t have GC|AT bias+  ​* ​Pacific Bio doesn’t have GC|AT bias
  
-#Estimating Genome Size from Read Data+====Estimating Genome Size from Read Data====
  . G = (pn(1-k+1))/​(λ_k)  . G = (pn(1-k+1))/​(λ_k)
  G = Genome size  G = Genome size
Line 38: Line 45:
  Simpson 2013, arXiv  Simpson 2013, arXiv
  
-To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution+  * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution
  
-#Error Correction +====Error Correction==== 
- High amount of small kmers are usually errors+  ​* ​High amount of small kmers are usually errors
  
 ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50  ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50 
lecture_notes/04-08-2015.txt · Last modified: 2015/04/17 22:34 by sihussai