User Tools

Site Tools


lecture_notes:04-08-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
lecture_notes:04-08-2015 [2015/04/09 22:42]
sihussai
lecture_notes:04-08-2015 [2015/04/09 23:05]
sihussai
Line 10: Line 10:
             end_2 <=====             end_2 <=====
  
-  * Can’t sequence repeat ​regions ​with paired-end reads+  * Problem: not really sufficient for repetitive ​regions
     * ends can't be very far apart because Illumina can't handle big molecules     * ends can't be very far apart because Illumina can't handle big molecules
     * not enough info for scaffolding     * not enough info for scaffolding
Line 23: Line 23:
  
   * Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > …   * Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > …
-  ​* Dependent on inferring insert size+    * Cut DNA, attach a biotin tag to both ends of the target molecule 
 +    * Circularize target molecule 
 +      * This step is hard 
 +    * Circularized molecule is fragmented, fragment with the biotin tag (a.k.a the ends of the original target stuck to each other backwards, with the tags in between) is enriched 
 +    * Then end repair, A-tailing, adapters added, amplification,​ sequencing  
 +  ​* Dependent on inferring insert size (can be tricky)  
 +  * Most companies can get you 8 kb inserts, with skill you can get up to 20 kb 
 +  * Complicated process, weird stuff can happen in between  
 +  * Important difference between paired ends and mate-pairs: ends are oriented the opposite way. 
  
  
 ====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== ====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries====
 +  * Uncommon and expensive, but the gold standard ​
   * Bacterial F-plasmid takes< 40 kb insert size   * Bacterial F-plasmid takes< 40 kb insert size
   * Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads   * Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads
   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif   * http://​www.scq.ubc.ca/​wp-content/​plasmidtext.gif
  
-====Read Quality Assessment ​Tools==== +====Read Quality Assessment==== 
-  * FastQC (Most popular tool to tell you about the read library) ​ +  * Base quality: Phred scores reported by sequencer.  
-  * Preqc+  * Fastq files: fasta files, plus encoded phred scores 
 +    * Need to know if your file has phred33 or phred64 encoding ​   
 +  * Quality for each individual base is not the whole story, the context matters to the signal processing too
   * Reads decrease in quality further down the read   * Reads decrease in quality further down the read
 +    *  **Depending on the assembler, you might need to trim lower quality reads off the end. Others want untrimmed data**
   * Pacific Bio doesn’t have GC|AT bias   * Pacific Bio doesn’t have GC|AT bias
 +
 +===Tools===
 +  * FastQC (Most popular tool to tell you about the read library) ​
 +    * FastQC reported an issue with our data with kmer count (related to adapter content)
 +      * **This needs to be checked out and diagnosed!** ​
 +  * Preqc
 +    * Estimates how difficult the assembly will be
  
 ====Estimating Genome Size from Read Data==== ====Estimating Genome Size from Read Data====
- G = (pn(1-k+1))/​(λ_k)+ G = (pn(1-k+1))/​(λ_k)
  G = Genome size  G = Genome size
  pn = proportion of correct reads  pn = proportion of correct reads
Line 50: Line 69:
   * High amount of small kmers are usually errors   * High amount of small kmers are usually errors
  
-***Simulated ​contif ​length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif ​length N50 +**Simulated ​contig ​length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig ​length N50** 
lecture_notes/04-08-2015.txt · Last modified: 2015/04/17 22:34 by sihussai