This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
lecture_notes:04-30-2010 [2010/05/02 22:04] jstjohn |
lecture_notes:04-30-2010 [2010/05/05 21:05] (current) jmagasin |
||
|---|---|---|---|
| Line 72: | Line 72: | ||
| There are different types of browsers. Might be interesting to have | There are different types of browsers. Might be interesting to have | ||
| someone in browser group do a presentation on it. | someone in browser group do a presentation on it. | ||
| + | |||
| + | |||
| + | ==== Homework for Wednesday ==== | ||
| + | Homework assignments, as people have not been contributing | ||
| + | equally to the wiki. Those of you who took David Bernick's class | ||
| + | will have already done it. Different version of newbler produced | ||
| + | different assembly. Order and orient the contigs. Try to do it. | ||
| + | |||
| + | Do the contig alignments from the .join file.\\ | ||
| + | (Kevin had already re-done the map-colorspace5.) | ||
| ===== map-colorspace5 ====== | ===== map-colorspace5 ====== | ||
| Line 294: | Line 304: | ||
| encouraging. | encouraging. | ||
| - | ===== Homework for Wednesday ===== | ||
| - | Homework assignments, as people have not been contributing | ||
| - | equally to the wiki. Those of you who took David Bernick's class | ||
| - | will have already done it. Different version of newbler produced | ||
| - | different assembly. Order and orient the contigs. Try to do it. | ||
| - | Do the contig alignments from the .join file.\\ | + | ===== Second set of lecture notes ===== |
| - | (Kevin had already re-done the map-colorspace5.) | + | |
| + | From Jonathan Magasin. | ||
| + | |||
| + | Today's lecture covered files generated by Kevin's map-colorspace5\\ | ||
| + | script which will be necessary for the homework assignment: ordering\\ | ||
| + | the Pog contigs. Also covered: How to roughly estimate banana slug\\ | ||
| + | genome size. | ||
| + | |||
| + | ==== Homework assignment: assemble Pog ==== | ||
| + | |||
| + | From the newer newbler we have a different set of contigs, thirty-one\\ | ||
| + | of them. We also have mate pairs from SOLiD mapped to newbler\\ | ||
| + | assembly 5. The assignment is to order and orient them the contigs.\\ | ||
| + | Kevin has posted his solution in the assembly directory\\ | ||
| + | (assemblies/Pog/map-colorspace5). | ||
| + | |||
| + | === 'delete' and 'invert' files === | ||
| + | |||
| + | The trim9 files are output from Kevin's map-colorspace5 script. Most\\ | ||
| + | of them are not for us. The 'delete' files are for mate pairs where\\ | ||
| + | both mapped but not at the appropriate distance: closer than 350\\ | ||
| + | bases, or more than 5000 bases apart. These files can be studied to\\ | ||
| + | find massive deletions. | ||
| + | |||
| + | 'invert' files are for mate pairs that were on opposite strands (at\\ | ||
| + | any distance) and are useful for studying inversions. However\\ | ||
| + | inversions are not so useful because they will not be within a single\\ | ||
| + | contig. | ||
| + | |||
| + | The 'between contigs' files are for when the reads mapped to different\\ | ||
| + | contigs. Only cases with unique mappings are in these files. fixme:\\ | ||
| + | What file(s) is this? | ||
| + | |||
| + | === trim9.out === | ||
| + | |||
| + | The trim9.out file is a summary of what happened during mapping.\\ | ||
| + | Kmers are colorspace kmers. These kmers are the lengths after\\ | ||
| + | trimming off nine. 'uniquly mapped' means one for the R3 and one for\\ | ||
| + | the F3. Note that some of the summary stats are bugs (the 'wrong\\ | ||
| + | range' line) that have since been fixed. [Kevin reran the script of\\ | ||
| + | Friday.] | ||
| + | |||
| + | Compared to earlier mapping, Kevin has increased the agressiveness. | ||
| + | |||
| + | The deletions ('wrong range' line) appear to be from reads taken from\\ | ||
| + | clones with multiple adapters. | ||
| + | |||
| + | Strand biases (algorithmic) were checked for in the forward and\\ | ||
| + | backward counts of mapped reads. | ||
| + | |||
| + | Tonight Kevin [reran] the mapping script. The new output will\\ | ||
| + | include error rates by position. Sol requested these be called\\ | ||
| + | mismatches rather than errors. Kevin said they are estimates of\\ | ||
| + | sequencing error. They are in colorspace, and exclude indels. In\\ | ||
| + | reality they are differences between the reads and assemblies, not\\ | ||
| + | errors. But for all practical purposes when there is a mismatch\\ | ||
| + | between a read and the assembly the error is in the read.) | ||
| + | |||
| + | === trim9.joins === | ||
| + | |||
| + | The data we'll use is not in trim9.out, rather it is in trim9.joins.\\ | ||
| + | trim9.joins is computed from trim9-cross.rdb (which has reads that\\ | ||
| + | crossed a contig boundary). | ||
| + | |||
| + | Looking at the first line: How many reads support contig 22 following\\ | ||
| + | contig 3? 34K. The contigs lengths are there to see if a contig is\\ | ||
| + | short enough that mate pairs might span them. E.g. what orderings are\\ | ||
| + | consistent with the listed contig orders: 2->3->4, or maybe 3 is very\\ | ||
| + | short so we have a mate pair that jumps over 3. | ||
| + | |||
| + | A minus sign appearing before a contig name means to take the\\ | ||
| + | reverse-complement of that contig. (And contig3 followed by contig22\\ | ||
| + | is the same as -contig22 followed by -contig3.) | ||
| + | |||
| + | The last number on each line indicates if the mate pair were the\\ | ||
| + | optimal length (optimal is the peak of length distribution). That\\ | ||
| + | number is the expected length of the gap between the contigs. It is\\ | ||
| + | very rough. | ||
| + | |||
| + | The file is sorted most-counts to fewest. The stuff at bottom of the\\ | ||
| + | file is noise. Only about first fifty lines are helpful. | ||
| + | |||
| + | Long mate pairs are very helpful for disambiguating, and wish we had\\ | ||
| + | them for //H.pylori//. | ||
| + | |||
| + | === Your task === | ||
| + | |||
| + | Take the trim9.joins file and try to order and orient the contigs,\\ | ||
| + | knowing that some of them are duplicated, and that there is a virus in\\ | ||
| + | the mix and there should be one large chromosome. | ||