User Tools

Site Tools


lecture_notes:04-28-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
lecture_notes:04-28-2010 [2010/05/02 05:01]
galt
lecture_notes:04-28-2010 [2010/05/02 05:22]
galt
Line 1: Line 1:
 +John St. John's lecture on EULER-SR and Celera; Michael Cusack'​s lecture on MIRA
 +
 === Misc Notes: === === Misc Notes: ===
  
Line 12: Line 14:
  
  
-Kevin +Kevin mapped newbler to join the contigs
-mapped newbler to join the contigs+
 found a bug in the python script to map the solid reads. found a bug in the python script to map the solid reads.
 Detected because there were no joining reads Detected because there were no joining reads
Line 33: Line 34:
 that can use long reads and mate-pairs. that can use long reads and mate-pairs.
  
-ran well first time (it ran, at least) ​ \\ +**euler-sr-assembly1/​**\\ 
-have to run it where you installed it \\ +Ran on 454 data with the Sanger data concatenated into one file. 
-no makefiles ​\\+ 
 +Have to set up env vars.\\ 
 +No make install options.\\ 
 +Things are mixed up.\\ 
 +You have to run it where you installed it \\ 
 +${EUSRC}\\ 
 + 
 +It ran well the first time (it ran, at least) ​ \\ 
 + 
 +${EUSRC}/​assembly/​Assemble.pl pogreads.fasta 25\\
  
 result: \\ result: \\
Line 41: Line 51:
 are contigs overlapping?​ \\ are contigs overlapping?​ \\
 //find out:// \\ //find out:// \\
-check blat_strict_match ​ (blat alignment to reference genome) \\+check contig-blat_strict_match ​ (blat alignment to reference genome) \\
 look for "Q name" (contigs) which match to the same "T start" positions on the reference genome \\ look for "Q name" (contigs) which match to the same "T start" positions on the reference genome \\
 //​answer://​yes,​ appear to overlap a lot – double coverage because they totally overlap ​ //​answer://​yes,​ appear to overlap a lot – double coverage because they totally overlap ​
 +
 +There is one 91k contig.\\
  
 Things to try to improve the run: \\ Things to try to improve the run: \\
-- longer k-mers \\+- longer k-mers, increasing to 31 should be easy \\
 - increase frequency threshold (help make up for read errors, maybe?) \\ - increase frequency threshold (help make up for read errors, maybe?) \\
 +- throw out the tiny contigs, reduce your cutoff.
 +
 +Does have an option to do some simple quality filtering on the reads\\
 +if quality data such as fastq is used?\\
 +-minmult look at how many things map to this area,\\
 +if less than this many things, throw it out.\\
 +
 +Error-correct reads, construct repeat graph,\\
 +simplifiy repeat graph with mate-reads\\
 +Error correction by threading.\\
 +Tries to make minimal corrections to beginnings of reads,\\
 +uses those to make the kmers. ​  Later threads the full readlength through.
  
 "Error Correction via threading"​ \\ "Error Correction via threading"​ \\
Line 55: Line 79:
 - perhaps this is where it went wrong? \\ - perhaps this is where it went wrong? \\
  
 +Mate reads.\\
 +Multiple paths of similar length are hard to disambiguate.\\
 +You can use multiple matepairs and bootstrap analysis.\\
 +Use the paths with the highest probability.\\
 +
 +Pog repeats aside:\\
 +There are several large homologous regions on opposite strands\\
 +in Pog data that are kinds of repeats. ​ \\
 +They are at both ends of the area that inverts.\\
 +Inversion happens by homologous matching, then swapping by two strands.\\
 +Like a sloppy integrase.
 +
 +
 +**Solid data.**\\
 +Used the regular base-space data in colorspace_input.fa (not double-encoded).\\
 Tried to run on just the SOLiD data… started on Sunday, but still running (Wed) \\ Tried to run on just the SOLiD data… started on Sunday, but still running (Wed) \\
  
lecture_notes/04-28-2010.txt · Last modified: 2010/05/02 16:22 by karplus