Differences

This shows you the differences between two versions of the page.

--- lecture_notes:04-09-2010 [2010/04/09 22:31]
cbrumbau created
+++ lecture_notes:04-09-2010 [2010/04/11 22:09]
learithe
@@ Line 9: / Line 9: @@
 Next week will have a reference genome (POG) to use for testing the tools on.
 For the most part POG is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point.
+Note about sequencing platform quality scores: most platforms are trying to use the phred quality score((http://en.wikipedia.org/wiki/Phred_quality_score)), so the quality score is comparable between the platforms and runs
+It can be informative, once reads are mapped, to look at the quality scores for reads with observed errors.
+//Pog// assembly is down to only 8 snps & one potentially variable insert
+Lior Pachter (from UC Berkeley) is vising on Monday, to speak about the Bowtie/TopHat/CuffLinks algorithms. Bowtie: mapping; TopHat/Cufflinks: find splice junctions, predicted spliced transcripts. Bowtie is used in a lot of the assembly agorithms.
 ===== Main lecture: Assembler graphs =====
@@ Line 14: / Line 22: @@
 Types of assembler graphs:
   * Overlap graph
-  * de Bruijn graph
+  * de Bruijn graph  (pronounced like "De Broin")
 Differences are "What are the nodes?"
   * Overlap: reads
-  * de Bruijn: k-mers (usually fixed k, k <= length(read))
+  * de Bruijn: k-mers (usually fixed k, k < = length(read))
 === Overlap graphs ===
@@ Line 30: / Line 38: @@
             B
 </code>
-The problem is the direction of the reads when aligning:
+The problem with edges between contig nodes is in defining direction of the reads when aligning:
   * 4 different edge scenarios:
     * -> -> (A -> B)
@@ Line 37: / Line 45: @@
     * <- <- (B -> A)
   * 3 different edge types:
-    * A to B
+    * same dir: A to B / B to A
-    * B to A
+    * tail-to-tail (convergent): A' to B
-    * A' to B / A to B'
+    * head-to-head (divergent): A to B'
+Need to have some tolerance for error because the reads are noisy. When creating read overlaps, if you require 100% pairing, you’ll miss a lot of data. plus these include the read ends, where quality falls off. so need a “overlap quality score”.
+Can’t do all-vs-all searches (n^2 algorithms not a good idea with billions of reads…). So how do you search what to overlap? Most algorithms do a blast-like filter before trying to align edges (~ nlogn)
+(Side Note: for transcriptome libraries, if done properly, reads should have known strandedness, so can’t be run through algorithms which make strandedness arbitrary (story about problems with a prominent yeast microarray transcriptome analysis incorrectly finding a lot of “antisense” mRNAs due to library prep error)
-Need to have some tolerance for error because the reads are noisy.
 === de Bruijn graphs ===

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools