User Tools

Site Tools


lecture_notes:04-30-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
lecture_notes:04-30-2010 [2010/05/05 20:53]
jmagasin
lecture_notes:04-30-2010 [2010/05/05 21:05] (current)
jmagasin
Line 309: Line 309:
 From Jonathan Magasin. From Jonathan Magasin.
  
-Today'​s lecture covered files generated by Kevin'​s map-colorspace5 +Today'​s lecture covered files generated by Kevin'​s map-colorspace5\\ 
-script which will be necessary for the homework assignment: ordering +script which will be necessary for the homework assignment: ordering\\ 
-the POG contigs. ​ Also covered: How to roughly estimate banana slug+the Pog contigs. ​ Also covered: How to roughly estimate banana slug\\
 genome size. genome size.
  
-==== Homework assignment: assemble ​POG ====+==== Homework assignment: assemble ​Pog ====
  
-From the newer newbler we have a different set of contigs, thirty-one +From the newer newbler we have a different set of contigs, thirty-one\\ 
-of them.  We also have mate pairs from SOLiD mapped to newbler +of them.  We also have mate pairs from SOLiD mapped to newbler\\ 
-assembly 5.  The assignment is to order and orient them the contigs. +assembly 5.  The assignment is to order and orient them the contigs.\\ 
-Kevin has posted his solution in the assembly directory+Kevin has posted his solution in the assembly directory\\
 (assemblies/​Pog/​map-colorspace5). (assemblies/​Pog/​map-colorspace5).
  
 === '​delete'​ and '​invert'​ files === === '​delete'​ and '​invert'​ files ===
  
-The trim9 files are output from Kevin'​s map-colorspace5 script. ​ Most +The trim9 files are output from Kevin'​s map-colorspace5 script. ​ Most\\ 
-of them are not for us.  The '​delete'​ files are for mate pairs where +of them are not for us.  The '​delete'​ files are for mate pairs where\\ 
-both mapped but not at the appropriate distance: closer than 350 +both mapped but not at the appropriate distance: closer than 350\\ 
-bases, or more than 5000 bases apart. ​ These files can be studied to+bases, or more than 5000 bases apart. ​ These files can be studied to\\
 find massive deletions. find massive deletions.
  
-'​invert'​ files are for mate pairs that were on opposite strands (at +'​invert'​ files are for mate pairs that were on opposite strands (at\\ 
-any distance) and are useful for studying inversions. ​ However +any distance) and are useful for studying inversions. ​ However\\ 
-inversions are not so useful because they will not be within a single+inversions are not so useful because they will not be within a single\\
 contig. contig.
  
-The '​between contigs'​ files are for when the reads mapped to different +The '​between contigs'​ files are for when the reads mapped to different\\ 
-contigs. ​ Only cases with unique mappings are in these files. ​ fixme:+contigs. ​ Only cases with unique mappings are in these files. ​ fixme:\\
 What file(s) is this? What file(s) is this?
  
 === trim9.out === === trim9.out ===
  
-The trim9.out file is a summary of what happened during mapping. +The trim9.out file is a summary of what happened during mapping.\\ 
-Kmers are colorspace kmers. ​ These kmers are the lengths after +Kmers are colorspace kmers. ​ These kmers are the lengths after\\ 
-trimming off nine.  '​uniquly mapped'​ means one for the R3 and one for +trimming off nine.  '​uniquly mapped'​ means one for the R3 and one for\\ 
-the F3.  Note that some of the summary stats are bugs (the '​wrong +the F3.  Note that some of the summary stats are bugs (the 'wrong\\ 
-range' line) that have since been fixed. ​ [Kevin reran the script of+range' line) that have since been fixed. ​ [Kevin reran the script of\\
 Friday.] Friday.]
  
 Compared to earlier mapping, Kevin has increased the agressiveness. Compared to earlier mapping, Kevin has increased the agressiveness.
  
-The deletions ('​wrong range' line) appear to be from reads taken from+The deletions ('​wrong range' line) appear to be from reads taken from\\
 clones with multiple adapters. clones with multiple adapters.
  
-Strand biases (algorithmic) were checked for in the forward and+Strand biases (algorithmic) were checked for in the forward and\\
 backward counts of mapped reads. backward counts of mapped reads.
  
-Tonight Kevin [reran] the mapping script. ​ The new output will +Tonight Kevin [reran] the mapping script. ​ The new output will\\ 
-include error rates by position. Sol requested these be called +include error rates by position. Sol requested these be called\\ 
-mismatches rather than errors. ​ Kevin said they are estimates of +mismatches rather than errors. ​ Kevin said they are estimates of\\ 
-sequencing error. ​ They are in colorspace, and exclude indels. ​ In +sequencing error. ​ They are in colorspace, and exclude indels. ​ In\\ 
-reality they are differences between the reads and assemblies, not +reality they are differences between the reads and assemblies, not\\ 
-errors. ​ But for all practical purposes when there is a mismatch+errors. ​ But for all practical purposes when there is a mismatch\\
 between a read and the assembly the error is in the read.) between a read and the assembly the error is in the read.)
  
 === trim9.joins === === trim9.joins ===
  
-The data we'll use is not in trim9.out, rather it is in trim9.joins. +The data we'll use is not in trim9.out, rather it is in trim9.joins.\\ 
-trim9.joins is computed from trim9-cross.rdb (which has reads that+trim9.joins is computed from trim9-cross.rdb (which has reads that\\
 crossed a contig boundary).  ​ crossed a contig boundary).  ​
  
-Looking at the first line: How many reads support contig 22 following +Looking at the first line: How many reads support contig 22 following\\ 
-contig 3?  34K.  The contigs lengths are there to see if a contig is +contig 3?  34K.  The contigs lengths are there to see if a contig is\\ 
-short enough that mate pairs might span them.  E.g. what orderings are +short enough that mate pairs might span them.  E.g. what orderings are\\ 
-consistent with the listed contig orders: 2->​3->​4,​ or maybe 3 is very+consistent with the listed contig orders: 2->​3->​4,​ or maybe 3 is very\\
 short so we have a mate pair that jumps over 3. short so we have a mate pair that jumps over 3.
  
-A minus sign appearing before a contig name means to take the +A minus sign appearing before a contig name means to take the\\ 
-reverse-complement of that contig. ​ (And contig3 followed by contig22+reverse-complement of that contig. ​ (And contig3 followed by contig22\\
 is the same as -contig22 followed by -contig3.) is the same as -contig22 followed by -contig3.)
  
-The last number on each line indicates if the mate pair were the +The last number on each line indicates if the mate pair were the\\ 
-optimal length (optimal is the peak of length distribution). ​ That +optimal length (optimal is the peak of length distribution). ​ That\\ 
-number is the expected length of the gap between the contigs. ​ It is+number is the expected length of the gap between the contigs. ​ It is\\
 very rough. very rough.
  
-The file is sorted most-counts to fewest. ​ The stuff at bottom of the+The file is sorted most-counts to fewest. ​ The stuff at bottom of the\\
 file is noise. ​ Only about first fifty lines are helpful. file is noise. ​ Only about first fifty lines are helpful.
  
-Long mate pairs are very helpful for disambiguating,​ and wish we had +Long mate pairs are very helpful for disambiguating,​ and wish we had\\ 
-them for H.pylori.+them for //H.pylori//.
  
 === Your task === === Your task ===
  
-Take the trim9.joins file and try to order and orient the contigs, +Take the trim9.joins file and try to order and orient the contigs,\\ 
-knowing that some of them are duplicated, and that there is a virus in+knowing that some of them are duplicated, and that there is a virus in\\
 the mix and there should be one large chromosome. the mix and there should be one large chromosome.
  
  
lecture_notes/04-30-2010.1273092831.txt.gz · Last modified: 2010/05/05 20:53 by jmagasin