User Tools

Site Tools


lecture_notes:04-30-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-30-2010 [2010/05/02 22:04]
jstjohn
lecture_notes:04-30-2010 [2010/05/05 21:05] (current)
jmagasin
Line 72: Line 72:
 There are different types of browsers. ​ Might be interesting to have There are different types of browsers. ​ Might be interesting to have
 someone in browser group do a presentation on it. someone in browser group do a presentation on it.
 +
 +
 +==== Homework for Wednesday ====
 +Homework assignments,​ as people have not been contributing
 +equally to the wiki.  Those of you who took David Bernick'​s class
 +will have already done it.  Different version of newbler produced
 +different assembly. ​ Order and orient the contigs. ​ Try to do it.
 +
 +Do the contig alignments from the .join file.\\
 +(Kevin had already re-done the map-colorspace5.)
  
 ===== map-colorspace5 ====== ===== map-colorspace5 ======
Line 294: Line 304:
 encouraging. encouraging.
  
-===== Homework for Wednesday ===== 
-Homework assignments,​ as people have not been contributing 
-equally to the wiki.  Those of you who took David Bernick'​s class 
-will have already done it.  Different version of newbler produced 
-different assembly. ​ Order and orient the contigs. ​ Try to do it. 
  
-Do the contig alignments ​from the .join file.\\ +===== Second set of lecture notes ===== 
-(Kevin ​had already re-done the map-colorspace5.)+ 
 +From Jonathan Magasin. 
 + 
 +Today'​s lecture covered files generated by Kevin'​s map-colorspace5\\ 
 +script which will be necessary for the homework assignment: ordering\\ 
 +the Pog contigs. ​ Also covered: How to roughly estimate banana slug\\ 
 +genome size. 
 + 
 +==== Homework assignment: assemble Pog ==== 
 + 
 +From the newer newbler we have a different set of contigs, thirty-one\\ 
 +of them.  We also have mate pairs from SOLiD mapped to newbler\\ 
 +assembly 5.  The assignment is to order and orient them the contigs.\\ 
 +Kevin has posted his solution in the assembly directory\\ 
 +(assemblies/​Pog/​map-colorspace5). 
 + 
 +=== '​delete'​ and '​invert'​ files === 
 + 
 +The trim9 files are output from Kevin'​s map-colorspace5 script. ​ Most\\ 
 +of them are not for us.  The '​delete'​ files are for mate pairs where\\ 
 +both mapped but not at the appropriate distance: closer than 350\\ 
 +bases, or more than 5000 bases apart. ​ These files can be studied to\\ 
 +find massive deletions. 
 + 
 +'​invert'​ files are for mate pairs that were on opposite strands (at\\ 
 +any distance) and are useful for studying inversions. ​ However\\ 
 +inversions are not so useful because they will not be within a single\\ 
 +contig. 
 + 
 +The '​between contigs'​ files are for when the reads mapped to different\\ 
 +contigs. ​ Only cases with unique mappings are in these files. ​ fixme:\\ 
 +What file(s) is this? 
 + 
 +=== trim9.out === 
 + 
 +The trim9.out ​file is a summary of what happened during mapping.\\ 
 +Kmers are colorspace kmers. ​ These kmers are the lengths after\\ 
 +trimming off nine.  '​uniquly mapped'​ means one for the R3 and one for\\ 
 +the F3.  Note that some of the summary stats are bugs (the '​wrong\\ 
 +range' line) that have since been fixed. ​ [Kevin reran the script of\\ 
 +Friday.] 
 + 
 +Compared to earlier mapping, Kevin has increased the agressiveness. 
 + 
 +The deletions ('​wrong range' line) appear to be from reads taken from\\ 
 +clones with multiple adapters. 
 + 
 +Strand biases (algorithmic) were checked for in the forward and\\ 
 +backward counts of mapped reads. 
 + 
 +Tonight Kevin [reran] the mapping script. ​ The new output will\\ 
 +include error rates by position. Sol requested these be called\\ 
 +mismatches rather than errors. ​ Kevin said they are estimates of\\ 
 +sequencing error. ​ They are in colorspace, and exclude indels. ​ In\\ 
 +reality they are differences between the reads and assemblies, not\\ 
 +errors. ​ But for all practical purposes when there is a mismatch\\ 
 +between a read and the assembly the error is in the read.) 
 + 
 +=== trim9.joins === 
 + 
 +The data we'll use is not in trim9.out, rather it is in trim9.joins.\\ 
 +trim9.joins is computed from trim9-cross.rdb (which has reads that\\ 
 +crossed a contig boundary). ​  
 + 
 +Looking at the first line: How many reads support contig 22 following\\ 
 +contig 3?  34K.  The contigs lengths are there to see if a contig is\\ 
 +short enough that mate pairs might span them.  E.g. what orderings are\\ 
 +consistent with the listed contig orders: 2->​3->​4,​ or maybe 3 is very\\ 
 +short so we have a mate pair that jumps over 3. 
 + 
 +A minus sign appearing before a contig name means to take the\\ 
 +reverse-complement of that contig. ​ (And contig3 followed by contig22\\ 
 +is the same as -contig22 followed by -contig3.) 
 + 
 +The last number on each line indicates if the mate pair were the\\ 
 +optimal length (optimal is the peak of length distribution). ​ That\\ 
 +number is the expected length of the gap between the contigs. ​ It is\\ 
 +very rough. 
 + 
 +The file is sorted most-counts to fewest. ​ The stuff at bottom of the\\ 
 +file is noise. ​ Only about first fifty lines are helpful. 
 + 
 +Long mate pairs are very helpful for disambiguating,​ and wish we had\\ 
 +them for //​H.pylori//​. 
 + 
 +=== Your task === 
 + 
 +Take the trim9.joins file and try to order and orient the contigs,​\\ 
 +knowing that some of them are duplicated, and that there is a virus in\\ 
 +the mix and there should be one large chromosome. 
  
lecture_notes/04-30-2010.1272837861.txt.gz · Last modified: 2010/05/02 22:04 by jstjohn