User Tools

Site Tools


lecture_notes:04-30-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-30-2010 [2010/05/02 22:05]
jstjohn
lecture_notes:04-30-2010 [2010/05/05 21:05] (current)
jmagasin
Line 303: Line 303:
 Even just getting the right answer within a factor of 10 was  Even just getting the right answer within a factor of 10 was 
 encouraging. encouraging.
 +
 +
 +===== Second set of lecture notes =====
 +
 +From Jonathan Magasin.
 +
 +Today'​s lecture covered files generated by Kevin'​s map-colorspace5\\
 +script which will be necessary for the homework assignment: ordering\\
 +the Pog contigs. ​ Also covered: How to roughly estimate banana slug\\
 +genome size.
 +
 +==== Homework assignment: assemble Pog ====
 +
 +From the newer newbler we have a different set of contigs, thirty-one\\
 +of them.  We also have mate pairs from SOLiD mapped to newbler\\
 +assembly 5.  The assignment is to order and orient them the contigs.\\
 +Kevin has posted his solution in the assembly directory\\
 +(assemblies/​Pog/​map-colorspace5).
 +
 +=== '​delete'​ and '​invert'​ files ===
 +
 +The trim9 files are output from Kevin'​s map-colorspace5 script. ​ Most\\
 +of them are not for us.  The '​delete'​ files are for mate pairs where\\
 +both mapped but not at the appropriate distance: closer than 350\\
 +bases, or more than 5000 bases apart. ​ These files can be studied to\\
 +find massive deletions.
 +
 +'​invert'​ files are for mate pairs that were on opposite strands (at\\
 +any distance) and are useful for studying inversions. ​ However\\
 +inversions are not so useful because they will not be within a single\\
 +contig.
 +
 +The '​between contigs'​ files are for when the reads mapped to different\\
 +contigs. ​ Only cases with unique mappings are in these files. ​ fixme:\\
 +What file(s) is this?
 +
 +=== trim9.out ===
 +
 +The trim9.out file is a summary of what happened during mapping.\\
 +Kmers are colorspace kmers. ​ These kmers are the lengths after\\
 +trimming off nine.  '​uniquly mapped'​ means one for the R3 and one for\\
 +the F3.  Note that some of the summary stats are bugs (the '​wrong\\
 +range' line) that have since been fixed. ​ [Kevin reran the script of\\
 +Friday.]
 +
 +Compared to earlier mapping, Kevin has increased the agressiveness.
 +
 +The deletions ('​wrong range' line) appear to be from reads taken from\\
 +clones with multiple adapters.
 +
 +Strand biases (algorithmic) were checked for in the forward and\\
 +backward counts of mapped reads.
 +
 +Tonight Kevin [reran] the mapping script. ​ The new output will\\
 +include error rates by position. Sol requested these be called\\
 +mismatches rather than errors. ​ Kevin said they are estimates of\\
 +sequencing error. ​ They are in colorspace, and exclude indels. ​ In\\
 +reality they are differences between the reads and assemblies, not\\
 +errors. ​ But for all practical purposes when there is a mismatch\\
 +between a read and the assembly the error is in the read.)
 +
 +=== trim9.joins ===
 +
 +The data we'll use is not in trim9.out, rather it is in trim9.joins.\\
 +trim9.joins is computed from trim9-cross.rdb (which has reads that\\
 +crossed a contig boundary).  ​
 +
 +Looking at the first line: How many reads support contig 22 following\\
 +contig 3?  34K.  The contigs lengths are there to see if a contig is\\
 +short enough that mate pairs might span them.  E.g. what orderings are\\
 +consistent with the listed contig orders: 2->​3->​4,​ or maybe 3 is very\\
 +short so we have a mate pair that jumps over 3.
 +
 +A minus sign appearing before a contig name means to take the\\
 +reverse-complement of that contig. ​ (And contig3 followed by contig22\\
 +is the same as -contig22 followed by -contig3.)
 +
 +The last number on each line indicates if the mate pair were the\\
 +optimal length (optimal is the peak of length distribution). ​ That\\
 +number is the expected length of the gap between the contigs. ​ It is\\
 +very rough.
 +
 +The file is sorted most-counts to fewest. ​ The stuff at bottom of the\\
 +file is noise. ​ Only about first fifty lines are helpful.
 +
 +Long mate pairs are very helpful for disambiguating,​ and wish we had\\
 +them for //​H.pylori//​.
 +
 +=== Your task ===
 +
 +Take the trim9.joins file and try to order and orient the contigs,\\
 +knowing that some of them are duplicated, and that there is a virus in\\
 +the mix and there should be one large chromosome.
 +
  
lecture_notes/04-30-2010.1272837952.txt.gz ยท Last modified: 2010/05/02 22:05 by jstjohn