This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
lecture_notes:04-09-2010 [2010/04/09 22:31] cbrumbau created |
lecture_notes:04-09-2010 [2010/04/11 22:09] learithe |
||
---|---|---|---|
Line 9: | Line 9: | ||
Next week will have a reference genome (POG) to use for testing the tools on. | Next week will have a reference genome (POG) to use for testing the tools on. | ||
For the most part POG is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point. | For the most part POG is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point. | ||
+ | |||
+ | Note about sequencing platform quality scores: most platforms are trying to use the phred quality score((http://en.wikipedia.org/wiki/Phred_quality_score)), so the quality score is comparable between the platforms and runs | ||
+ | |||
+ | It can be informative, once reads are mapped, to look at the quality scores for reads with observed errors. | ||
+ | |||
+ | //Pog// assembly is down to only 8 snps & one potentially variable insert | ||
+ | |||
+ | Lior Pachter (from UC Berkeley) is vising on Monday, to speak about the Bowtie/TopHat/CuffLinks algorithms. Bowtie: mapping; TopHat/Cufflinks: find splice junctions, predicted spliced transcripts. Bowtie is used in a lot of the assembly agorithms. | ||
===== Main lecture: Assembler graphs ===== | ===== Main lecture: Assembler graphs ===== | ||
Line 14: | Line 22: | ||
Types of assembler graphs: | Types of assembler graphs: | ||
* Overlap graph | * Overlap graph | ||
- | * de Bruijn graph | + | * de Bruijn graph (pronounced like "De Broin") |
Differences are "What are the nodes?" | Differences are "What are the nodes?" | ||
* Overlap: reads | * Overlap: reads | ||
- | * de Bruijn: k-mers (usually fixed k, k <= length(read)) | + | * de Bruijn: k-mers (usually fixed k, k < = length(read)) |
=== Overlap graphs === | === Overlap graphs === | ||
Line 30: | Line 38: | ||
B | B | ||
</code> | </code> | ||
- | The problem is the direction of the reads when aligning: | + | The problem with edges between contig nodes is in defining direction of the reads when aligning: |
* 4 different edge scenarios: | * 4 different edge scenarios: | ||
* -> -> (A -> B) | * -> -> (A -> B) | ||
Line 37: | Line 45: | ||
* <- <- (B -> A) | * <- <- (B -> A) | ||
* 3 different edge types: | * 3 different edge types: | ||
- | * A to B | + | * same dir: A to B / B to A |
- | * B to A | + | * tail-to-tail (convergent): A' to B |
- | * A' to B / A to B' | + | * head-to-head (divergent): A to B' |
+ | |||
+ | Need to have some tolerance for error because the reads are noisy. When creating read overlaps, if you require 100% pairing, you’ll miss a lot of data. plus these include the read ends, where quality falls off. so need a “overlap quality score”. | ||
+ | |||
+ | Can’t do all-vs-all searches (n^2 algorithms not a good idea with billions of reads…). So how do you search what to overlap? Most algorithms do a blast-like filter before trying to align edges (~ nlogn) | ||
+ | |||
+ | |||
+ | (Side Note: for transcriptome libraries, if done properly, reads should have known strandedness, so can’t be run through algorithms which make strandedness arbitrary (story about problems with a prominent yeast microarray transcriptome analysis incorrectly finding a lot of “antisense” mRNAs due to library prep error) | ||
- | Need to have some tolerance for error because the reads are noisy. | ||
=== de Bruijn graphs === | === de Bruijn graphs === |