User Tools

Site Tools


lecture_notes:04-28-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
lecture_notes:04-28-2010 [2010/05/01 22:34]
galt
lecture_notes:04-28-2010 [2010/05/02 09:15]
karplus fixed number of contigs for Newbler
Line 8: Line 8:
 use makefiles, not shell scripts! use makefiles, not shell scripts!
  
-SOLiD data formats:\\+**Sanger quality info**\\ 
 +Kevin found the location of the Sanger qual info.\\ ​  
 +.as or something like that.\\ 
 +3 different files from 3 different runs.\\ 
 + 
 +**SOLiD data formats**:\\
 .csfasta = colorspace with numbers\\ .csfasta = colorspace with numbers\\
 .de = changes #s to letters (0123 -> ACGT) but it’s colors not numbers! very confusing.\\ .de = changes #s to letters (0123 -> ACGT) but it’s colors not numbers! very confusing.\\
Line 117: Line 122:
  
 Took about 50 minutes for all.\\ Took about 50 minutes for all.\\
-For comparison, Newbler took 18 minutes and about 40 contigs.+For comparison, Newbler took 18 minutes and 31 non-overlapping ​contigs.
  
-Just qsub them with no arguments, and it runs everything.\\+Just qsub them with no arguments, and it runs everything. ​("​Them"?​ "​it"?​ What does this sentence mean? FIXME  --- //​[[karplus@soe.ucsc.edu|Kevin Karplus]] 2010/05/02 09:14//)
  
  
 === MIRA === === MIRA ===
 +
 +Mostly used the default settings.
 +
 +mira-assembly1/​
 +
 +Running is easy.
 +Parameters: fasta denovo, tell it which instruments it has (e.g. 454 etc).
  
 Needs datafile named pog_in.[format].fa \\ Needs datafile named pog_in.[format].fa \\
-sff_extract script to create .qual files+uses sff_extract script to create ​.fasta and .fasta.qual files \\ 
 +and also the traceinfo_in.454.xml file.
  
-created 30 contigs ​>=500 (largest contig 640k) \\ +Time: 1 hour plus. 
-but... upon mapping to the reference genome, ​ \\+ 
 +Created 621 contigs, 30 larger than 500(largest contig 640k) \\ 
 +The 500 cutoff it probably too large.\\ 
 +100 might me more reasonable.\\ 
 +Total concensus size is good.\\ 
 +But... upon mapping to the reference genome, ​ \\
 it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent. it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent.
-it’s getting bigger contigs because it’s joining them incorrectly! \\ +It’s getting bigger contigs because it’s joining them incorrectly! \\ 
-this is very bad; worse even than a lot of small contigs \\+This is very bad; worse even than a lot of small contigs \\ 
 + 
 +Not DBG.  Should find out more about how it actually works.\\ 
 +Good to know how it works so you know what to do with the parameters. 
 + 
 +Newbler may be able to take fasta+qual file. 
 + 
 +Mira might be worth fussing with on the parameters a bit more if it looks like 
 +it is doing a good job. 
 + 
 +Mira probably can't handle large genomes due to memory. 
 +Mira has a tool to estimate memory required. 
 +For a 3.2G genome it will need 1.1TB ram.
  
  
lecture_notes/04-28-2010.txt · Last modified: 2010/05/02 09:22 by karplus