This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
lecture_notes:04-09-2010 [2010/04/11 22:19] learithe |
lecture_notes:04-09-2010 [2010/04/11 22:32] learithe |
||
---|---|---|---|
Line 7: | Line 7: | ||
Take fix mode script from /projects/compbio/bin/scripts and replace protein user group with BME 235 user group. | Take fix mode script from /projects/compbio/bin/scripts and replace protein user group with BME 235 user group. | ||
- | Next week will have a reference genome (POG) to use for testing the tools on. | + | Next week will have a reference genome (//Pyrobaculum oguniense//, aka //"Pog"//) to use for testing the tools on. |
- | For the most part POG is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point. | + | For the most part Pog is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point. (//Pog// assembly is down to only 8 snps & one potentially variable insert) |
- | Note about sequencing platform quality scores: most platforms are trying to use the phred quality score((http://en.wikipedia.org/wiki/Phred_quality_score)), so the quality score is comparable between the platforms and runs | + | Note about sequencing platform quality scores: most platforms are trying to use the phred quality score((http://en.wikipedia.org/wiki/Phred_quality_score)), so the quality score is theoretically comparable between the platforms and runs (although calibration causes scores to vary between runs and instruments nonetheless) |
It can be informative, once reads are mapped, to look at the quality scores for reads with observed errors. | It can be informative, once reads are mapped, to look at the quality scores for reads with observed errors. | ||
- | //Pog// assembly is down to only 8 snps & one potentially variable insert | + | |
Lior Pachter (from UC Berkeley) is vising on Monday, to speak about the Bowtie/TopHat/CuffLinks algorithms. Bowtie: mapping; TopHat/Cufflinks: find splice junctions, predicted spliced transcripts. Bowtie is used in a lot of the assembly agorithms. | Lior Pachter (from UC Berkeley) is vising on Monday, to speak about the Bowtie/TopHat/CuffLinks algorithms. Bowtie: mapping; TopHat/Cufflinks: find splice junctions, predicted spliced transcripts. Bowtie is used in a lot of the assembly agorithms. | ||
Line 103: | Line 103: | ||
Realistically, there are issues: | Realistically, there are issues: | ||
+ | == End of Contig boundaries: == | ||
- | End of Contig boundaries: | ||
what if A->B and A->C and A->D BUT A->B and A->C are inconsistent with each other? | what if A->B and A->C and A->D BUT A->B and A->C are inconsistent with each other? | ||
… A becomes “end of contig”, because you aren’t sure where to go next | … A becomes “end of contig”, because you aren’t sure where to go next | ||
also end of contig if there are no more edges from the node | also end of contig if there are no more edges from the node | ||
- | Spurs: | + | == Spurs: == |
<code> | <code> | ||
kmer -> kmer -> kmer -> kmer -> kmer | kmer -> kmer -> kmer -> kmer -> kmer | ||
Line 116: | Line 116: | ||
path diverges but does not reconverge, resulting in source/sink dead-ends (these are likely due to read errors) | path diverges but does not reconverge, resulting in source/sink dead-ends (these are likely due to read errors) | ||
- | Bubbles: | + | == Bubbles: == |
<code> | <code> | ||
/-> kmer -> kmer -> kmer -\ | /-> kmer -> kmer -> kmer -\ | ||
Line 124: | Line 124: | ||
path splits due to a SNP but then converges. this can happen with real SNPs, read error SNPs, and real repeats which differ by a SNP or two | path splits due to a SNP but then converges. this can happen with real SNPs, read error SNPs, and real repeats which differ by a SNP or two | ||
- | Other issues: | + | == Loop: == |
- | + | ||
- | Loop: | + | |
<code> | <code> | ||
kmer -> kmer -> kmer -> kmer -> kmer -> kmer -> kmer | kmer -> kmer -> kmer -> kmer -> kmer -> kmer -> kmer | ||
\- kmers <-/ | \- kmers <-/ | ||
</code> | </code> | ||
- | tandem repeats will generate a circle, but have edges in and out; hard to disambiguate copy # though.if the data is really clean (ie, in/out edges are ~10 read-depth with low SD, and inside circle has ~20 read-depth with low SD), can guess that there might be 2 copies of the repeat, but not highly reliable | + | tandem repeats will generate a circle, but have edges in and out; hard to disambiguate copy # though. If the data is really clean (ie, in/out edges are ~10 read-depth with low SD, and inside circle has ~20 read-depth with low SD), can guess that there might be 2 copies of the repeat, but not highly reliable |
- | Multiple paths: | + | == Multiple paths: == |
<code> | <code> | ||
A B | A B | ||
Line 147: | Line 145: | ||
Largest bias usually comes from PCR for amplification. | Largest bias usually comes from PCR for amplification. | ||
- | + | ===Assembly:=== | |
- | Assembly: | + | algorithms (both overlap and de Bruijn) need to collapse bubbles and trim spurs.\\ |
- | algorithms (both overlap and de Bruijn) need to collapse bubbles and trim spurs. | + | spurs: discard if their read count is low\\ |
- | spurs: discard if their read count is low | + | |
bubbles: tricky, because they can represent real, divergent paths | bubbles: tricky, because they can represent real, divergent paths | ||
- | |||
- | |||