This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lecture_notes:04-30-2010 [2010/05/02 22:04] jstjohn |
lecture_notes:04-30-2010 [2010/05/05 21:05] (current) jmagasin |
||
---|---|---|---|
Line 72: | Line 72: | ||
There are different types of browsers. Might be interesting to have | There are different types of browsers. Might be interesting to have | ||
someone in browser group do a presentation on it. | someone in browser group do a presentation on it. | ||
+ | |||
+ | |||
+ | ==== Homework for Wednesday ==== | ||
+ | Homework assignments, as people have not been contributing | ||
+ | equally to the wiki. Those of you who took David Bernick's class | ||
+ | will have already done it. Different version of newbler produced | ||
+ | different assembly. Order and orient the contigs. Try to do it. | ||
+ | |||
+ | Do the contig alignments from the .join file.\\ | ||
+ | (Kevin had already re-done the map-colorspace5.) | ||
===== map-colorspace5 ====== | ===== map-colorspace5 ====== | ||
Line 294: | Line 304: | ||
encouraging. | encouraging. | ||
- | ===== Homework for Wednesday ===== | ||
- | Homework assignments, as people have not been contributing | ||
- | equally to the wiki. Those of you who took David Bernick's class | ||
- | will have already done it. Different version of newbler produced | ||
- | different assembly. Order and orient the contigs. Try to do it. | ||
- | Do the contig alignments from the .join file.\\ | + | ===== Second set of lecture notes ===== |
- | (Kevin had already re-done the map-colorspace5.) | + | |
+ | From Jonathan Magasin. | ||
+ | |||
+ | Today's lecture covered files generated by Kevin's map-colorspace5\\ | ||
+ | script which will be necessary for the homework assignment: ordering\\ | ||
+ | the Pog contigs. Also covered: How to roughly estimate banana slug\\ | ||
+ | genome size. | ||
+ | |||
+ | ==== Homework assignment: assemble Pog ==== | ||
+ | |||
+ | From the newer newbler we have a different set of contigs, thirty-one\\ | ||
+ | of them. We also have mate pairs from SOLiD mapped to newbler\\ | ||
+ | assembly 5. The assignment is to order and orient them the contigs.\\ | ||
+ | Kevin has posted his solution in the assembly directory\\ | ||
+ | (assemblies/Pog/map-colorspace5). | ||
+ | |||
+ | === 'delete' and 'invert' files === | ||
+ | |||
+ | The trim9 files are output from Kevin's map-colorspace5 script. Most\\ | ||
+ | of them are not for us. The 'delete' files are for mate pairs where\\ | ||
+ | both mapped but not at the appropriate distance: closer than 350\\ | ||
+ | bases, or more than 5000 bases apart. These files can be studied to\\ | ||
+ | find massive deletions. | ||
+ | |||
+ | 'invert' files are for mate pairs that were on opposite strands (at\\ | ||
+ | any distance) and are useful for studying inversions. However\\ | ||
+ | inversions are not so useful because they will not be within a single\\ | ||
+ | contig. | ||
+ | |||
+ | The 'between contigs' files are for when the reads mapped to different\\ | ||
+ | contigs. Only cases with unique mappings are in these files. fixme:\\ | ||
+ | What file(s) is this? | ||
+ | |||
+ | === trim9.out === | ||
+ | |||
+ | The trim9.out file is a summary of what happened during mapping.\\ | ||
+ | Kmers are colorspace kmers. These kmers are the lengths after\\ | ||
+ | trimming off nine. 'uniquly mapped' means one for the R3 and one for\\ | ||
+ | the F3. Note that some of the summary stats are bugs (the 'wrong\\ | ||
+ | range' line) that have since been fixed. [Kevin reran the script of\\ | ||
+ | Friday.] | ||
+ | |||
+ | Compared to earlier mapping, Kevin has increased the agressiveness. | ||
+ | |||
+ | The deletions ('wrong range' line) appear to be from reads taken from\\ | ||
+ | clones with multiple adapters. | ||
+ | |||
+ | Strand biases (algorithmic) were checked for in the forward and\\ | ||
+ | backward counts of mapped reads. | ||
+ | |||
+ | Tonight Kevin [reran] the mapping script. The new output will\\ | ||
+ | include error rates by position. Sol requested these be called\\ | ||
+ | mismatches rather than errors. Kevin said they are estimates of\\ | ||
+ | sequencing error. They are in colorspace, and exclude indels. In\\ | ||
+ | reality they are differences between the reads and assemblies, not\\ | ||
+ | errors. But for all practical purposes when there is a mismatch\\ | ||
+ | between a read and the assembly the error is in the read.) | ||
+ | |||
+ | === trim9.joins === | ||
+ | |||
+ | The data we'll use is not in trim9.out, rather it is in trim9.joins.\\ | ||
+ | trim9.joins is computed from trim9-cross.rdb (which has reads that\\ | ||
+ | crossed a contig boundary). | ||
+ | |||
+ | Looking at the first line: How many reads support contig 22 following\\ | ||
+ | contig 3? 34K. The contigs lengths are there to see if a contig is\\ | ||
+ | short enough that mate pairs might span them. E.g. what orderings are\\ | ||
+ | consistent with the listed contig orders: 2->3->4, or maybe 3 is very\\ | ||
+ | short so we have a mate pair that jumps over 3. | ||
+ | |||
+ | A minus sign appearing before a contig name means to take the\\ | ||
+ | reverse-complement of that contig. (And contig3 followed by contig22\\ | ||
+ | is the same as -contig22 followed by -contig3.) | ||
+ | |||
+ | The last number on each line indicates if the mate pair were the\\ | ||
+ | optimal length (optimal is the peak of length distribution). That\\ | ||
+ | number is the expected length of the gap between the contigs. It is\\ | ||
+ | very rough. | ||
+ | |||
+ | The file is sorted most-counts to fewest. The stuff at bottom of the\\ | ||
+ | file is noise. Only about first fifty lines are helpful. | ||
+ | |||
+ | Long mate pairs are very helpful for disambiguating, and wish we had\\ | ||
+ | them for //H.pylori//. | ||
+ | |||
+ | === Your task === | ||
+ | |||
+ | Take the trim9.joins file and try to order and orient the contigs,\\ | ||
+ | knowing that some of them are duplicated, and that there is a virus in\\ | ||
+ | the mix and there should be one large chromosome. | ||