This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
lecture_notes:04-28-2010 [2010/05/02 05:34] galt |
lecture_notes:04-28-2010 [2010/05/02 05:42] galt |
||
---|---|---|---|
Line 8: | Line 8: | ||
use makefiles, not shell scripts! | use makefiles, not shell scripts! | ||
- | SOLiD data formats:\\ | + | **Sanger quality info**\\ |
+ | Kevin found the location of the Sanger qual info.\\ | ||
+ | .as or something like that.\\ | ||
+ | 3 different files from 3 different runs.\\ | ||
+ | |||
+ | **SOLiD data formats**:\\ | ||
.csfasta = colorspace with numbers\\ | .csfasta = colorspace with numbers\\ | ||
.de = changes #s to letters (0123 -> ACGT) but it’s colors not numbers! very confusing.\\ | .de = changes #s to letters (0123 -> ACGT) but it’s colors not numbers! very confusing.\\ | ||
Line 123: | Line 128: | ||
=== MIRA === | === MIRA === | ||
+ | |||
+ | Mostly used the default settings. | ||
+ | |||
+ | mira-assembly1/ | ||
+ | |||
+ | Running is easy. | ||
+ | Parameters: fasta denovo, tell it which instruments it has (e.g. 454 etc). | ||
Needs datafile named pog_in.[format].fa \\ | Needs datafile named pog_in.[format].fa \\ | ||
- | sff_extract script to create .qual files | + | uses sff_extract script to create .fasta and .fasta.qual files \\ |
+ | and also the traceinfo_in.454.xml file. | ||
- | created 30 contigs >=500 (largest contig 640k) \\ | + | Time: 1 hour plus. |
- | but... upon mapping to the reference genome, \\ | + | |
+ | Created 621 contigs, 30 larger than 500. (largest contig 640k) \\ | ||
+ | The 500 cutoff it probably too large.\\ | ||
+ | 100 might me more reasonable.\\ | ||
+ | Total concensus size is good.\\ | ||
+ | But... upon mapping to the reference genome, \\ | ||
it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent. | it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent. | ||
- | it’s getting bigger contigs because it’s joining them incorrectly! \\ | + | It’s getting bigger contigs because it’s joining them incorrectly! \\ |
- | this is very bad; worse even than a lot of small contigs \\ | + | This is very bad; worse even than a lot of small contigs \\ |
+ | |||
+ | Not DBG. Should find out more about how it actually works.\\ | ||
+ | Good to know how it works so you know what to do with the parameters. | ||
+ | |||
+ | Newbler may be able to take fasta+qual file. | ||
+ | |||
+ | Mira might be worth fussing with on the parameters a bit more if it looks like | ||
+ | it is doing a good job. | ||
+ | |||
+ | Mira probably can't handle large genomes due to memory. | ||
+ | Mira has a tool to estimate memory required. | ||
+ | For a 3.2G genome it will need 1.1TB ram. | ||