User Tools

Site Tools


lecture_notes:04-30-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
lecture_notes:04-30-2010 [2010/05/02 06:12]
galt
lecture_notes:04-30-2010 [2010/05/05 20:53]
jmagasin
Line 14: Line 14:
 Should add a mappers section on the tools page. Should add a mappers section on the tools page.
  
-Next week will be student presentations.+Next week will be student presentations. We need volunteers to do presentations. Especially the people who haven'​t yet presented on an assembler or mapper.
  
 Presentations should include the algorithms, Presentations should include the algorithms,
Line 23: Line 23:
 2 students per day would be fine. 2 students per day would be fine.
 If it takes longer, that's fine. If it takes longer, that's fine.
 +
 +Volunteers for Monday:
 +Installing, Running, How does it work in general?
 +Find out which have gotten done and which have not.
 +Mark off which have been done on the list. (apr 5)
 +
 +Shorty maybe should be on the list.
 +The idea of using the matepaired or paired end data
 +to make a group in a cluster and then to assemble that.
 +Let's add Shorty to the list of tools.
 +
 +Maybe we don't have anyone ready for Monday,
 +you need to be ready to do it in the next two weeks.
 +You can tell me or send me email.
  
 ==== Literature ==== ==== Literature ====
Line 52: Line 66:
 papers on assemblers. papers on assemblers.
  
-==== Back to Presentations ==== 
- 
-Volunteers for Monday: 
-Installing, Running, How does it work in general? 
-Find out which have gotten done and which have not. 
-Mark off which have been done on the list. (apr 5) 
- 
-Shorty maybe should be on the list. 
-The idea of using the matepaired or paired end data 
-to make a group in a cluster and then to assemble that. 
-Let's add Shorty to the list of tools. 
- 
-Maybe we don't have anyone ready for Monday, 
-you need to be ready to do it in the next two weeks. 
-You can tell me or send me email. 
  
 ==== Genome Browser ==== ==== Genome Browser ====
Line 74: Line 73:
 someone in browser group do a presentation on it. someone in browser group do a presentation on it.
  
-==== Homework ====+ 
 +==== Homework ​for Wednesday ​====
 Homework assignments,​ as people have not been contributing Homework assignments,​ as people have not been contributing
 equally to the wiki.  Those of you who took David Bernick'​s class equally to the wiki.  Those of you who took David Bernick'​s class
 will have already done it.  Different version of newbler produced will have already done it.  Different version of newbler produced
 different assembly. ​ Order and orient the contigs. ​ Try to do it. different assembly. ​ Order and orient the contigs. ​ Try to do it.
 +
 +Do the contig alignments from the .join file.\\
 +(Kevin had already re-done the map-colorspace5.)
  
 ===== map-colorspace5 ====== ===== map-colorspace5 ======
Line 301: Line 304:
 encouraging. encouraging.
  
-===== Homework ​for Wednesday ​===== + 
-Do the contig alignments ​from the .join file.\\ +===== Second set of lecture notes ===== 
-(Kevin ​had already re-done the map-colorspace5.)+ 
 +From Jonathan Magasin. 
 + 
 +Today'​s lecture covered files generated by Kevin'​s map-colorspace5 
 +script which will be necessary ​for the homework assignment: ordering 
 +the POG contigs. ​ Also covered: How to roughly estimate banana slug 
 +genome size. 
 + 
 +==== Homework assignment: assemble POG ==== 
 + 
 +From the newer newbler we have a different set of contigs, thirty-one 
 +of them.  We also have mate pairs from SOLiD mapped to newbler 
 +assembly 5.  The assignment is to order and orient them the contigs. 
 +Kevin has posted his solution in the assembly directory 
 +(assemblies/​Pog/​map-colorspace5). 
 + 
 +=== '​delete'​ and '​invert'​ files === 
 + 
 +The trim9 files are output from Kevin'​s map-colorspace5 script. ​ Most 
 +of them are not for us.  The '​delete'​ files are for mate pairs where 
 +both mapped but not at the appropriate distance: closer than 350 
 +bases, or more than 5000 bases apart. ​ These files can be studied to 
 +find massive deletions. 
 + 
 +'​invert'​ files are for mate pairs that were on opposite strands (at 
 +any distance) and are useful for studying inversions. ​ However 
 +inversions are not so useful because they will not be within a single 
 +contig. 
 + 
 +The '​between contigs'​ files are for when the reads mapped to different 
 +contigs. ​ Only cases with unique mappings are in these files. ​ fixme: 
 +What file(s) is this? 
 + 
 +=== trim9.out === 
 + 
 +The trim9.out file is a summary of what happened during mapping. 
 +Kmers are colorspace kmers. ​ These kmers are the lengths after 
 +trimming off nine.  '​uniquly mapped'​ means one for the R3 and one for 
 +the F3.  Note that some of the summary stats are bugs (the '​wrong 
 +range' line) that have since been fixed. ​ [Kevin reran the script of 
 +Friday.] 
 + 
 +Compared to earlier mapping, Kevin has increased the agressiveness. 
 + 
 +The deletions ('​wrong range' line) appear to be from reads taken from 
 +clones with multiple adapters. 
 + 
 +Strand biases (algorithmic) were checked for in the forward and 
 +backward counts of mapped reads. 
 + 
 +Tonight Kevin [reran] the mapping script. ​ The new output will 
 +include error rates by position. Sol requested these be called 
 +mismatches rather than errors. ​ Kevin said they are estimates of 
 +sequencing error. ​ They are in colorspace, and exclude indels. ​ In 
 +reality they are differences between the reads and assemblies, not 
 +errors. ​ But for all practical purposes when there is a mismatch 
 +between a read and the assembly the error is in the read.) 
 + 
 +=== trim9.joins === 
 + 
 +The data we'll use is not in trim9.out, rather it is in trim9.joins. 
 +trim9.joins is computed from trim9-cross.rdb (which has reads that 
 +crossed a contig boundary). ​  
 + 
 +Looking at the first line: How many reads support contig 22 following 
 +contig 3?  34K.  The contigs lengths are there to see if a contig is 
 +short enough that mate pairs might span them.  E.g. what orderings are 
 +consistent with the listed contig orders: 2->​3->​4,​ or maybe 3 is very 
 +short so we have a mate pair that jumps over 3. 
 + 
 +A minus sign appearing before a contig name means to take the 
 +reverse-complement of that contig. ​ (And contig3 followed by contig22 
 +is the same as -contig22 followed by -contig3.) 
 + 
 +The last number on each line indicates if the mate pair were the 
 +optimal length (optimal is the peak of length distribution). ​ That 
 +number is the expected length of the gap between the contigs. ​ It is 
 +very rough. 
 + 
 +The file is sorted most-counts to fewest. ​ The stuff at bottom of the 
 +file is noise. ​ Only about first fifty lines are helpful. 
 + 
 +Long mate pairs are very helpful for disambiguating,​ and wish we had 
 +them for H.pylori. 
 + 
 +=== Your task === 
 + 
 +Take the trim9.joins file and try to order and orient the contigs, 
 +knowing that some of them are duplicated, and that there is a virus in 
 +the mix and there should be one large chromosome. 
  
lecture_notes/04-30-2010.txt · Last modified: 2010/05/05 21:05 by jmagasin