User Tools

Site Tools


lecture_notes:04-30-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-30-2010 [2010/05/02 22:01]
jstjohn
lecture_notes:04-30-2010 [2010/05/05 21:05] (current)
jmagasin
Line 23: Line 23:
 2 students per day would be fine. 2 students per day would be fine.
 If it takes longer, that's fine. If it takes longer, that's fine.
 +
 +Volunteers for Monday:
 +Installing, Running, How does it work in general?
 +Find out which have gotten done and which have not.
 +Mark off which have been done on the list. (apr 5)
 +
 +Shorty maybe should be on the list.
 +The idea of using the matepaired or paired end data
 +to make a group in a cluster and then to assemble that.
 +Let's add Shorty to the list of tools.
 +
 +Maybe we don't have anyone ready for Monday,
 +you need to be ready to do it in the next two weeks.
 +You can tell me or send me email.
  
 ==== Literature ==== ==== Literature ====
Line 52: Line 66:
 papers on assemblers. papers on assemblers.
  
-==== Back to Presentations ==== 
- 
-Volunteers for Monday: 
-Installing, Running, How does it work in general? 
-Find out which have gotten done and which have not. 
-Mark off which have been done on the list. (apr 5) 
- 
-Shorty maybe should be on the list. 
-The idea of using the matepaired or paired end data 
-to make a group in a cluster and then to assemble that. 
-Let's add Shorty to the list of tools. 
- 
-Maybe we don't have anyone ready for Monday, 
-you need to be ready to do it in the next two weeks. 
-You can tell me or send me email. 
  
 ==== Genome Browser ==== ==== Genome Browser ====
Line 74: Line 73:
 someone in browser group do a presentation on it. someone in browser group do a presentation on it.
  
-==== Homework ====+ 
 +==== Homework ​for Wednesday ​====
 Homework assignments,​ as people have not been contributing Homework assignments,​ as people have not been contributing
 equally to the wiki.  Those of you who took David Bernick'​s class equally to the wiki.  Those of you who took David Bernick'​s class
 will have already done it.  Different version of newbler produced will have already done it.  Different version of newbler produced
 different assembly. ​ Order and orient the contigs. ​ Try to do it. different assembly. ​ Order and orient the contigs. ​ Try to do it.
 +
 +Do the contig alignments from the .join file.\\
 +(Kevin had already re-done the map-colorspace5.)
  
 ===== map-colorspace5 ====== ===== map-colorspace5 ======
Line 301: Line 304:
 encouraging. encouraging.
  
-===== Homework ​for Wednesday ​===== + 
-Do the contig alignments ​from the .join file.\\ +===== Second set of lecture notes ===== 
-(Kevin ​had already re-done the map-colorspace5.)+ 
 +From Jonathan Magasin. 
 + 
 +Today'​s lecture covered files generated by Kevin'​s map-colorspace5\\ 
 +script which will be necessary ​for the homework assignment: ordering\\ 
 +the Pog contigs. ​ Also covered: How to roughly estimate banana slug\\ 
 +genome size. 
 + 
 +==== Homework assignment: assemble Pog ==== 
 + 
 +From the newer newbler we have a different set of contigs, thirty-one\\ 
 +of them.  We also have mate pairs from SOLiD mapped to newbler\\ 
 +assembly 5.  The assignment is to order and orient them the contigs.\\ 
 +Kevin has posted his solution in the assembly directory\\ 
 +(assemblies/​Pog/​map-colorspace5). 
 + 
 +=== '​delete'​ and '​invert'​ files === 
 + 
 +The trim9 files are output from Kevin'​s map-colorspace5 script. ​ Most\\ 
 +of them are not for us.  The '​delete'​ files are for mate pairs where\\ 
 +both mapped but not at the appropriate distance: closer than 350\\ 
 +bases, or more than 5000 bases apart. ​ These files can be studied to\\ 
 +find massive deletions. 
 + 
 +'​invert'​ files are for mate pairs that were on opposite strands (at\\ 
 +any distance) and are useful for studying inversions. ​ However\\ 
 +inversions are not so useful because they will not be within a single\\ 
 +contig. 
 + 
 +The '​between contigs'​ files are for when the reads mapped to different\\ 
 +contigs. ​ Only cases with unique mappings are in these files. ​ fixme:\\ 
 +What file(s) is this? 
 + 
 +=== trim9.out === 
 + 
 +The trim9.out ​file is a summary of what happened during mapping.\\ 
 +Kmers are colorspace kmers. ​ These kmers are the lengths after\\ 
 +trimming off nine.  '​uniquly mapped'​ means one for the R3 and one for\\ 
 +the F3.  Note that some of the summary stats are bugs (the '​wrong\\ 
 +range' line) that have since been fixed. ​ [Kevin reran the script of\\ 
 +Friday.] 
 + 
 +Compared to earlier mapping, Kevin has increased the agressiveness. 
 + 
 +The deletions ('​wrong range' line) appear to be from reads taken from\\ 
 +clones with multiple adapters. 
 + 
 +Strand biases (algorithmic) were checked for in the forward and\\ 
 +backward counts of mapped reads. 
 + 
 +Tonight Kevin [reran] the mapping script. ​ The new output will\\ 
 +include error rates by position. Sol requested these be called\\ 
 +mismatches rather than errors. ​ Kevin said they are estimates of\\ 
 +sequencing error. ​ They are in colorspace, and exclude indels. ​ In\\ 
 +reality they are differences between the reads and assemblies, not\\ 
 +errors. ​ But for all practical purposes when there is a mismatch\\ 
 +between a read and the assembly the error is in the read.) 
 + 
 +=== trim9.joins === 
 + 
 +The data we'll use is not in trim9.out, rather it is in trim9.joins.\\ 
 +trim9.joins is computed from trim9-cross.rdb (which has reads that\\ 
 +crossed a contig boundary). ​  
 + 
 +Looking at the first line: How many reads support contig 22 following\\ 
 +contig 3?  34K.  The contigs lengths are there to see if a contig is\\ 
 +short enough that mate pairs might span them.  E.g. what orderings are\\ 
 +consistent with the listed contig orders: 2->​3->​4,​ or maybe 3 is very\\ 
 +short so we have a mate pair that jumps over 3. 
 + 
 +A minus sign appearing before a contig name means to take the\\ 
 +reverse-complement of that contig. ​ (And contig3 followed by contig22\\ 
 +is the same as -contig22 followed by -contig3.) 
 + 
 +The last number on each line indicates if the mate pair were the\\ 
 +optimal length (optimal is the peak of length distribution). ​ That\\ 
 +number is the expected length of the gap between the contigs. ​ It is\\ 
 +very rough. 
 + 
 +The file is sorted most-counts to fewest. ​ The stuff at bottom of the\\ 
 +file is noise. ​ Only about first fifty lines are helpful. 
 + 
 +Long mate pairs are very helpful for disambiguating,​ and wish we had\\ 
 +them for //​H.pylori//​. 
 + 
 +=== Your task === 
 + 
 +Take the trim9.joins file and try to order and orient the contigs,​\\ 
 +knowing that some of them are duplicated, and that there is a virus in\\ 
 +the mix and there should be one large chromosome. 
  
lecture_notes/04-30-2010.1272837665.txt.gz · Last modified: 2010/05/02 22:01 by jstjohn