This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
archive:computer_resources:assemblies [2010/06/12 02:14] mpcusack |
archive:computer_resources:assemblies [2011/06/02 19:26] eyliaw |
||
---|---|---|---|
Line 103: | Line 103: | ||
* is not high. 21k max scaffold size. Estimated | * is not high. 21k max scaffold size. Estimated | ||
* genome size is around 3G. The 4 steps are | * genome size is around 3G. The 4 steps are | ||
- | * 1. pregraph (3.5 to 4.5 hours for 30 to 60 cpus) | + | - pregraph (3.5 to 4.5 hours for 30 to 60 cpus) |
- | * 2. contig (1.3 hours) | + | - contig (1.3 hours) |
- | * 3. map (0.6 hours with 60cpus) - paired ends | + | - map (0.6 hours with 60cpus) - paired ends |
- | * 4. scaff (1 hour with 60cpus) | + | - scaff (1 hour with 60cpus) |
+ | * barcode-of-life/ attempt to assemble the mitochondrial genome, with particular emphasis on the gene for mitochondrial cytochrome c oxidase subunit I protein I (CO1), which is used for the "barcode of life". [[http://www.boldsystems.org/|BOLD (barcode of life database)]] | ||
+ | * Started with a search of SOAPdenovo-assembly1/k31/soapSlug.scafSeq for scaffolds that matched examples from other mollusks. | ||
+ | * Looked for 454 reads that extended or joined contigs in scaffold | ||
+ | * Repeated (sometimes using more sensitive searches) until no more credible scaffolds from the SOAPdenovo-assembly1/k31/ assembly nor 454 reads were found. | ||
+ | * The 454 coverage of the mitochondrion is so slight as to be nearly useless, so instead we can iterate: | ||
+ | - find all Illumina reads that map to the mitochondrial draft, using BWA | ||
+ | - assemble them using SOAPdenovo. | ||
+ | * It looks like the Illumina reads have about 228x coverage of the mitochondrion, but coverage is patchy, and it seems to be difficult to close the circle (at least with SOAPdenovo). | ||
+ | * I have an almost complete mitochondrial genome, and I'm hoping that a few more iterations or some tricky assembly will close it into a clean circular genome. | ||
+ | * SOAPdenovo-assembly2/ Assembly with new + old Illumina and 454 data. | ||
+ | * SOAPdenovo 1.05 - can handle gzipped fastq files. | ||
+ | * Runs with k27, 31, 47, and 63 so far. 47 was the best overall. 63 got the longest contig (~14.9kb). | ||
+ | * Run parameters: | ||
+ | * pregraph: | ||
+ | - lowest count size of 2 (-d 2) | ||
+ | * contig: | ||
+ | - solve tiny repeats on (-R) | ||
+ | * map: | ||
+ | - all default | ||
+ | * scaff: | ||
+ | - intra-scaffold gap closure on (-F) |