User Tools

Site Tools


lecture_notes:04-08-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
lecture_notes:04-08-2015 [2015/04/09 19:12]
jolespin
lecture_notes:04-08-2015 [2015/04/09 22:42]
sihussai
Line 1: Line 1:
- **De nova Assembly II** | Wed 8 April 2015 | Stefan Prost stefan.prost@berkley.edu ​| jolespin notes+=====De novo Assembly II===== 
 +**Guest lecturer: ​Stefan Proststefan.prost@berkley.edu** 
  
-#Illumina Paired-end Sequencing Libraries +====Illumina Paired-end Sequencing Libraries==== 
- MiSeq has 300 bp reads +  ​* ​MiSeq has 300 bp reads 
- Paired ends read from both directions+  ​* ​Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)
  
- =====> ​read_1+ =====> ​end_1
  ________________________  ________________________
-            read_2 ​<===== +            end_2 <=====
- . Can’t sequence repeat regions with paired-end reads+
  
-#Illumina Mate-Pair Sequencing Libraries +  * Can’t sequence repeat regions with paired-end reads 
- <​===== ​read_1+    * ends can't be very far apart because Illumina can't handle big molecules 
 +    * not enough info for scaffolding 
 + 
 +====Illumina Mate-Pair Sequencing Libraries=== 
 +  * Idea: get paired reads that are much farther away (for more scaffolding info) 
 +  * Basically, the same idea as paired-ends,​ except the middle section is missing and the ends are oriented the opposite way.  
 + 
 + <​===== ​end_1
  ________________________  ________________________
-            read_2 ​=====> +            end_2 =====> 
- . Dependent on inferring insert size + 
- Genomic DNA > Fragment (2-5 kb) > biotinylate ends > +  ​* ​Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > … 
-   ​Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > +  * Dependent on inferring insert size 
-          ​Ligate adaptors > …+
  
-#BAC (Bacterial Artificial Chromosome) and Fosmid Libraries +====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== 
- Bacterial F-plasmid takes< 40 kb insert size +  ​* ​Bacterial F-plasmid takes< 40 kb insert size 
- Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads +  ​* ​Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads 
- http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif+  ​* ​http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif
  
-#Read Quality Assessment Tools +====Read Quality Assessment Tools==== 
- FastQC (Most popular tool to tell you about the read library)  +  ​* ​FastQC (Most popular tool to tell you about the read library)  
- Preqc +  ​* ​Preqc 
- Reads decrease in quality further down the read +  ​* ​Reads decrease in quality further down the read 
- Pacific Bio doesn’t have GC|AT bias+  ​* ​Pacific Bio doesn’t have GC|AT bias
  
-#Estimating Genome Size from Read Data+====Estimating Genome Size from Read Data====
  . G = (pn(1-k+1))/​(λ_k)  . G = (pn(1-k+1))/​(λ_k)
  G = Genome size  G = Genome size
Line 38: Line 45:
  Simpson 2013, arXiv  Simpson 2013, arXiv
  
-To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution+  * To estimate genome size need to know:i) total number of reads; ii) length of reads; ​ and iii) kmer distribution
  
-#Error Correction +====Error Correction==== 
- High amount of small kmers are usually errors+  ​* ​High amount of small kmers are usually errors
  
 ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50  ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50 
lecture_notes/04-08-2015.txt · Last modified: 2015/04/17 22:34 by sihussai