This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
lecture_notes:04-30-2010 [2010/05/05 20:53] jmagasin |
lecture_notes:04-30-2010 [2010/05/05 21:05] (current) jmagasin |
||
---|---|---|---|
Line 309: | Line 309: | ||
From Jonathan Magasin. | From Jonathan Magasin. | ||
- | Today's lecture covered files generated by Kevin's map-colorspace5 | + | Today's lecture covered files generated by Kevin's map-colorspace5\\ |
- | script which will be necessary for the homework assignment: ordering | + | script which will be necessary for the homework assignment: ordering\\ |
- | the POG contigs. Also covered: How to roughly estimate banana slug | + | the Pog contigs. Also covered: How to roughly estimate banana slug\\ |
genome size. | genome size. | ||
- | ==== Homework assignment: assemble POG ==== | + | ==== Homework assignment: assemble Pog ==== |
- | From the newer newbler we have a different set of contigs, thirty-one | + | From the newer newbler we have a different set of contigs, thirty-one\\ |
- | of them. We also have mate pairs from SOLiD mapped to newbler | + | of them. We also have mate pairs from SOLiD mapped to newbler\\ |
- | assembly 5. The assignment is to order and orient them the contigs. | + | assembly 5. The assignment is to order and orient them the contigs.\\ |
- | Kevin has posted his solution in the assembly directory | + | Kevin has posted his solution in the assembly directory\\ |
(assemblies/Pog/map-colorspace5). | (assemblies/Pog/map-colorspace5). | ||
=== 'delete' and 'invert' files === | === 'delete' and 'invert' files === | ||
- | The trim9 files are output from Kevin's map-colorspace5 script. Most | + | The trim9 files are output from Kevin's map-colorspace5 script. Most\\ |
- | of them are not for us. The 'delete' files are for mate pairs where | + | of them are not for us. The 'delete' files are for mate pairs where\\ |
- | both mapped but not at the appropriate distance: closer than 350 | + | both mapped but not at the appropriate distance: closer than 350\\ |
- | bases, or more than 5000 bases apart. These files can be studied to | + | bases, or more than 5000 bases apart. These files can be studied to\\ |
find massive deletions. | find massive deletions. | ||
- | 'invert' files are for mate pairs that were on opposite strands (at | + | 'invert' files are for mate pairs that were on opposite strands (at\\ |
- | any distance) and are useful for studying inversions. However | + | any distance) and are useful for studying inversions. However\\ |
- | inversions are not so useful because they will not be within a single | + | inversions are not so useful because they will not be within a single\\ |
contig. | contig. | ||
- | The 'between contigs' files are for when the reads mapped to different | + | The 'between contigs' files are for when the reads mapped to different\\ |
- | contigs. Only cases with unique mappings are in these files. fixme: | + | contigs. Only cases with unique mappings are in these files. fixme:\\ |
What file(s) is this? | What file(s) is this? | ||
=== trim9.out === | === trim9.out === | ||
- | The trim9.out file is a summary of what happened during mapping. | + | The trim9.out file is a summary of what happened during mapping.\\ |
- | Kmers are colorspace kmers. These kmers are the lengths after | + | Kmers are colorspace kmers. These kmers are the lengths after\\ |
- | trimming off nine. 'uniquly mapped' means one for the R3 and one for | + | trimming off nine. 'uniquly mapped' means one for the R3 and one for\\ |
- | the F3. Note that some of the summary stats are bugs (the 'wrong | + | the F3. Note that some of the summary stats are bugs (the 'wrong\\ |
- | range' line) that have since been fixed. [Kevin reran the script of | + | range' line) that have since been fixed. [Kevin reran the script of\\ |
Friday.] | Friday.] | ||
Compared to earlier mapping, Kevin has increased the agressiveness. | Compared to earlier mapping, Kevin has increased the agressiveness. | ||
- | The deletions ('wrong range' line) appear to be from reads taken from | + | The deletions ('wrong range' line) appear to be from reads taken from\\ |
clones with multiple adapters. | clones with multiple adapters. | ||
- | Strand biases (algorithmic) were checked for in the forward and | + | Strand biases (algorithmic) were checked for in the forward and\\ |
backward counts of mapped reads. | backward counts of mapped reads. | ||
- | Tonight Kevin [reran] the mapping script. The new output will | + | Tonight Kevin [reran] the mapping script. The new output will\\ |
- | include error rates by position. Sol requested these be called | + | include error rates by position. Sol requested these be called\\ |
- | mismatches rather than errors. Kevin said they are estimates of | + | mismatches rather than errors. Kevin said they are estimates of\\ |
- | sequencing error. They are in colorspace, and exclude indels. In | + | sequencing error. They are in colorspace, and exclude indels. In\\ |
- | reality they are differences between the reads and assemblies, not | + | reality they are differences between the reads and assemblies, not\\ |
- | errors. But for all practical purposes when there is a mismatch | + | errors. But for all practical purposes when there is a mismatch\\ |
between a read and the assembly the error is in the read.) | between a read and the assembly the error is in the read.) | ||
=== trim9.joins === | === trim9.joins === | ||
- | The data we'll use is not in trim9.out, rather it is in trim9.joins. | + | The data we'll use is not in trim9.out, rather it is in trim9.joins.\\ |
- | trim9.joins is computed from trim9-cross.rdb (which has reads that | + | trim9.joins is computed from trim9-cross.rdb (which has reads that\\ |
crossed a contig boundary). | crossed a contig boundary). | ||
- | Looking at the first line: How many reads support contig 22 following | + | Looking at the first line: How many reads support contig 22 following\\ |
- | contig 3? 34K. The contigs lengths are there to see if a contig is | + | contig 3? 34K. The contigs lengths are there to see if a contig is\\ |
- | short enough that mate pairs might span them. E.g. what orderings are | + | short enough that mate pairs might span them. E.g. what orderings are\\ |
- | consistent with the listed contig orders: 2->3->4, or maybe 3 is very | + | consistent with the listed contig orders: 2->3->4, or maybe 3 is very\\ |
short so we have a mate pair that jumps over 3. | short so we have a mate pair that jumps over 3. | ||
- | A minus sign appearing before a contig name means to take the | + | A minus sign appearing before a contig name means to take the\\ |
- | reverse-complement of that contig. (And contig3 followed by contig22 | + | reverse-complement of that contig. (And contig3 followed by contig22\\ |
is the same as -contig22 followed by -contig3.) | is the same as -contig22 followed by -contig3.) | ||
- | The last number on each line indicates if the mate pair were the | + | The last number on each line indicates if the mate pair were the\\ |
- | optimal length (optimal is the peak of length distribution). That | + | optimal length (optimal is the peak of length distribution). That\\ |
- | number is the expected length of the gap between the contigs. It is | + | number is the expected length of the gap between the contigs. It is\\ |
very rough. | very rough. | ||
- | The file is sorted most-counts to fewest. The stuff at bottom of the | + | The file is sorted most-counts to fewest. The stuff at bottom of the\\ |
file is noise. Only about first fifty lines are helpful. | file is noise. Only about first fifty lines are helpful. | ||
- | Long mate pairs are very helpful for disambiguating, and wish we had | + | Long mate pairs are very helpful for disambiguating, and wish we had\\ |
- | them for H.pylori. | + | them for //H.pylori//. |
=== Your task === | === Your task === | ||
- | Take the trim9.joins file and try to order and orient the contigs, | + | Take the trim9.joins file and try to order and orient the contigs,\\ |
- | knowing that some of them are duplicated, and that there is a virus in | + | knowing that some of them are duplicated, and that there is a virus in\\ |
the mix and there should be one large chromosome. | the mix and there should be one large chromosome. | ||