Differences

This shows you the differences between two versions of the page.

--- lecture_notes:04-08-2015 [2015/04/09 22:42]
sihussai
+++ lecture_notes:04-08-2015 [2015/04/17 21:53]
sihussai fixing formatting
@@ Line 1: / Line 1: @@
-=====De novo Assembly II=====
+======De novo Assembly II======
 **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu**
-====Illumina Paired-end Sequencing Libraries====
+=====Illumina Paired-end Sequencing Libraries=====
   * MiSeq has 300 bp reads
   * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size)
@@ Line 10: / Line 10: @@
 		           end_2 <=====
-  * Can’t sequence repeat regions with paired-end reads
+  * Problem: not really sufficient for repetitive regions
     * ends can't be very far apart because Illumina can't handle big molecules
     * not enough info for scaffolding
-====Illumina Mate-Pair Sequencing Libraries===
+=====Illumina Mate-Pair Sequencing Libraries====
   * Idea: get paired reads that are much farther away (for more scaffolding info)
   * Basically, the same idea as paired-ends, except the middle section is missing and the ends are oriented the opposite way.
@@ Line 23: / Line 23: @@
   * Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > …
-  * Dependent on inferring insert size
+    * Cut DNA, attach a biotin tag to both ends of the target molecule
+    * Circularize target molecule
+      * This step is hard
+    * Circularized molecule is fragmented, fragment with the biotin tag (a.k.a the ends of the original target stuck to each other backwards, with the tags in between) is enriched
+    * Then end repair, A-tailing, adapters added, amplification, sequencing
+  * Dependent on inferring insert size (can be tricky)
+  * Most companies can get you 8 kb inserts, with skill you can get up to 20 kb
+  * Complicated process, weird stuff can happen in between
+  * Important difference between paired ends and mate-pairs: ends are oriented the opposite way.
-====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries====
+=====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries=====
+  * Uncommon and expensive, but the gold standard
   * Bacterial F-plasmid takes< 40 kb insert size
   * Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads
   * http://www.scq.ubc.ca/wp-content/plasmidtext.gif
-====Read Quality Assessment Tools====
+=====Read Quality Assessment=====
-  * FastQC (Most popular tool to tell you about the read library)
+  * Base quality: Phred scores reported by sequencer.
-  * Preqc
+  * Fastq files: fasta files, plus encoded phred scores
+    * Need to know if your file has phred33 or phred64 encoding
+  * Quality for each individual base is not the whole story, the context matters to the signal processing too
   * Reads decrease in quality further down the read
+    *  **Depending on the assembler, you might need to trim lower quality reads off the end. Others want untrimmed data**
   * Pacific Bio doesn’t have GC|AT bias
-====Estimating Genome Size from Read Data====
+====Tools====
-	. G = (pn(1-k+1))/(λ_k)
+  * FastQC (Most popular tool to tell you about the read library)
+    * FastQC reported an issue with our data with kmer count (related to adapter content)
+      * **This needs to be checked out and diagnosed!**
+  * Preqc
+    * Estimates how difficult the assembly will be
+=====Estimating Genome Size from Read Data=====
+	G = (pn(1-k+1))/(λ_k)
 	G = Genome size
 	pn = proportion of correct reads
@@ Line 47: / Line 66: @@
   * To estimate genome size need to know:i) total number of reads; ii) length of reads;  and iii) kmer distribution
-====Error Correction====
+=====Error Correction=====
   * High amount of small kmers are usually errors
-***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50
+**Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50**

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools