User Tools

Site Tools


archive:computer_resources:assemblies

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
archive:computer_resources:assemblies [2010/06/12 02:14]
mpcusack
archive:computer_resources:assemblies [2011/06/03 15:40]
karplus [slug/] added more on mitochondrion assembly
Line 103: Line 103:
     * is not high.  21k max scaffold size.  Estimated     * is not high.  21k max scaffold size.  Estimated
     * genome size is around 3G.  The 4 steps are     * genome size is around 3G.  The 4 steps are
-    * 1. pregraph (3.5 to 4.5 hours for 30 to 60 cpus) +      - pregraph (3.5 to 4.5 hours for 30 to 60 cpus) 
-    * 2. contig (1.3 hours) +      ​- ​contig (1.3 hours) 
-    * 3. map (0.6 hours with 60cpus) - paired ends +      ​- ​map (0.6 hours with 60cpus) - paired ends 
-    * 4. scaff (1 hour with 60cpus) +      ​- ​scaff (1 hour with 60cpus) 
 +  * barcode-of-life/​ attempt to assemble the mitochondrial genome, with particular emphasis on the gene for mitochondrial cytochrome c oxidase subunit I protein I (CO1), which is used for the "​barcode of life". [[http://​www.boldsystems.org/​|BOLD (barcode of life database)]] 
 +      * Started with a search of SOAPdenovo-assembly1/​k31/​soapSlug.scafSeq for scaffolds that matched examples from other mollusks. 
 +      * Looked for 454 reads that extended or joined contigs in scaffold 
 +      * Repeated (sometimes using more sensitive searches) until no more credible scaffolds from the SOAPdenovo-assembly1/​k31/​ assembly nor 454 reads were found. 
 +      * The 454 coverage of the mitochondrion is so slight as to be nearly useless, so instead we can iterate: 
 +        - find all Illumina reads that map to the mitochondrial draft, using BWA 
 +        - assemble them using SOAPdenovo. 
 +      * It looks like the Illumina reads have about 228x coverage of the mitochondrion,​ but coverage is patchy, and it seems to be difficult to close the circle (at least with SOAPdenovo). ​  
 +      * We have an almost complete mitochondrial genome, and I'm hoping that a few more iterations or some tricky assembly will close it into a clean circular genome. 
 +      * It turns out that a lot of the hard hand work and iterated searching to assemble the mitochondrion was not necessary, as the SOAPdenovo-assembly2/​k63_w_454_contigs/​ assembly now has a 14960-long contig (not scaffold!) which is an almost-full-length mitochondrion,​ roughly as good as the best I've managed to assemble so far.  I'll combine it with my efforts and see if I can eke out a few more bases. 
 +  * SOAPdenovo-assembly2/​ Assembly with new + old Illumina and 454 data. 
 +    * SOAPdenovo 1.05 - can handle gzipped fastq files. 
 +    * Runs with k27, 31, 47, and 63 so far.  47 was the best overall. ​ 63 got the longest contig (~14.9kb). 
 +    * Run parameters:​ 
 +      - pregraph: 
 +        * lowest count size of 2 (-d 2) 
 +      - contig: 
 +        * solve tiny repeats on (-R) 
 +      - map: 
 +        * all default 
 +      - scaff: 
 +        * intra-scaffold gap closure on (-F)
archive/computer_resources/assemblies.txt · Last modified: 2015/09/02 16:53 by 92.247.181.31