User Tools

Site Tools


lecture_notes:04-09-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
lecture_notes:04-09-2010 [2010/04/11 22:19]
learithe
lecture_notes:04-09-2010 [2010/04/11 22:32]
learithe
Line 7: Line 7:
 Take fix mode script from /​projects/​compbio/​bin/​scripts and replace protein user group with BME 235 user group. Take fix mode script from /​projects/​compbio/​bin/​scripts and replace protein user group with BME 235 user group.
  
-Next week will have a reference genome (POG) to use for testing the tools on. +Next week will have a reference genome (//​Pyrobaculum oguniense//,​ aka //"​Pog"//​) to use for testing the tools on. 
-For the most part POG is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point.+For the most part Pog is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point. ​(//Pog// assembly is down to only 8 snps & one potentially variable insert)
  
-Note about sequencing platform quality scores: most platforms are trying to use the phred quality score((http://​en.wikipedia.org/​wiki/​Phred_quality_score)),​ so the quality score is comparable between the platforms and runs+Note about sequencing platform quality scores: most platforms are trying to use the phred quality score((http://​en.wikipedia.org/​wiki/​Phred_quality_score)),​ so the quality score is theoretically ​comparable between the platforms and runs (although calibration causes scores to vary between runs and instruments nonetheless)
  
 It can be informative,​ once reads are mapped, to look at the quality scores for reads with observed errors. It can be informative,​ once reads are mapped, to look at the quality scores for reads with observed errors.
  
-//Pog// assembly is down to only 8 snps & one potentially variable insert+
  
 Lior Pachter (from UC Berkeley) is vising on Monday, to speak about the Bowtie/​TopHat/​CuffLinks algorithms. Bowtie: mapping; TopHat/​Cufflinks:​ find splice junctions, predicted spliced transcripts. Bowtie is used in a lot of the assembly agorithms. Lior Pachter (from UC Berkeley) is vising on Monday, to speak about the Bowtie/​TopHat/​CuffLinks algorithms. Bowtie: mapping; TopHat/​Cufflinks:​ find splice junctions, predicted spliced transcripts. Bowtie is used in a lot of the assembly agorithms.
Line 103: Line 103:
  
 Realistically,​ there are issues: Realistically,​ there are issues:
 +== End of Contig boundaries: ==
  
-End of Contig boundaries: 
 what if A->B and A->C and A->D BUT A->B and A->C are inconsistent with each other? what if A->B and A->C and A->D BUT A->B and A->C are inconsistent with each other?
 … A becomes “end of contig”, because you aren’t sure where to go next … A becomes “end of contig”, because you aren’t sure where to go next
 also end of contig if there are no more edges from the node also end of contig if there are no more edges from the node
  
-Spurs:+== Spurs: ​==
 <​code>​ <​code>​
 kmer -> kmer -> kmer -> kmer -> kmer kmer -> kmer -> kmer -> kmer -> kmer
Line 116: Line 116:
 path diverges but does not reconverge, resulting in source/sink dead-ends (these are likely due to read errors) path diverges but does not reconverge, resulting in source/sink dead-ends (these are likely due to read errors)
  
-Bubbles:+== Bubbles: ​==
 <​code>​ <​code>​
     /-> kmer -> kmer -> kmer -\     /-> kmer -> kmer -> kmer -\
Line 124: Line 124:
  path splits due to a SNP but then converges. this can happen with real SNPs, read error SNPs, and real repeats which differ by a SNP or two  path splits due to a SNP but then converges. this can happen with real SNPs, read error SNPs, and real repeats which differ by a SNP or two
  
-Other issues: +== Loop: ==
- +
-Loop:+
 <​code>​ <​code>​
 kmer -> kmer -> kmer -> kmer -> kmer -> kmer -> kmer kmer -> kmer -> kmer -> kmer -> kmer -> kmer -> kmer
                             \- kmers <-/                             \- kmers <-/
 </​code>​ </​code>​
-tandem repeats will generate a circle, but have edges in and out; hard to disambiguate copy # though.if the data is really clean (ie, in/out edges are ~10 read-depth with low SD, and inside circle has ~20 read-depth with low SD), can guess that there might be 2 copies of the repeat, but not highly reliable+tandem repeats will generate a circle, but have edges in and out; hard to disambiguate copy # though. ​If the data is really clean (ie, in/out edges are ~10 read-depth with low SD, and inside circle has ~20 read-depth with low SD), can guess that there might be 2 copies of the repeat, but not highly reliable
  
-Multiple paths:+== Multiple paths: ​==
 <​code>​ <​code>​
 A                      B A                      B
Line 147: Line 145:
 Largest bias usually comes from PCR for amplification. Largest bias usually comes from PCR for amplification.
  
- +===Assembly:=== 
-Assembly: +algorithms (both overlap and de Bruijn) need to collapse bubbles and trim spurs.\\ 
-algorithms (both overlap and de Bruijn) need to collapse bubbles and trim spurs. +spurs: discard if their read count is low\\
-spurs: discard if their read count is low+
 bubbles: tricky, because they can represent real, divergent paths bubbles: tricky, because they can represent real, divergent paths
- 
- 
  
  
  
lecture_notes/04-09-2010.txt · Last modified: 2010/04/12 21:37 by cbrumbau