This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
archive:computer_resources:assemblies [2011/06/02 19:27] eyliaw [slug/] |
archive:computer_resources:assemblies [2011/06/21 23:27] karplus [slug/] moved mitochondrion assembly information to a new page |
||
---|---|---|---|
Line 100: | Line 100: | ||
* so ran with filling -R to get 12k maxcontig. | * so ran with filling -R to get 12k maxcontig. | ||
* Then ran the scaffolding steps with 200bp insert size. | * Then ran the scaffolding steps with 200bp insert size. | ||
- | * For all steps, used low default cutoffs since our 10x coverage | + | * For all steps, used low default cutoffs since our 10x coverage is not high. 21k max scaffold size. |
- | * is not high. 21k max scaffold size. Estimated | + | * Estimated genome size is around 3G. |
- | * genome size is around 3G. The 4 steps are | + | * The 4 steps are |
- pregraph (3.5 to 4.5 hours for 30 to 60 cpus) | - pregraph (3.5 to 4.5 hours for 30 to 60 cpus) | ||
- contig (1.3 hours) | - contig (1.3 hours) | ||
- map (0.6 hours with 60cpus) - paired ends | - map (0.6 hours with 60cpus) - paired ends | ||
- scaff (1 hour with 60cpus) | - scaff (1 hour with 60cpus) | ||
- | * barcode-of-life/ attempt to assemble the mitochondrial genome, with particular emphasis on the gene for mitochondrial cytochrome c oxidase subunit I protein I (CO1), which is used for the "barcode of life". [[http://www.boldsystems.org/|BOLD (barcode of life database)]] | + | * barcode-of-life/ attempt to assemble the mitochondrial genome, documented on its own page: [[computer_resources:assemblies:mitochondrion]] |
- | * Started with a search of SOAPdenovo-assembly1/k31/soapSlug.scafSeq for scaffolds that matched examples from other mollusks. | + | |
- | * Looked for 454 reads that extended or joined contigs in scaffold | + | |
- | * Repeated (sometimes using more sensitive searches) until no more credible scaffolds from the SOAPdenovo-assembly1/k31/ assembly nor 454 reads were found. | + | |
- | * The 454 coverage of the mitochondrion is so slight as to be nearly useless, so instead we can iterate: | + | |
- | - find all Illumina reads that map to the mitochondrial draft, using BWA | + | |
- | - assemble them using SOAPdenovo. | + | |
- | * It looks like the Illumina reads have about 228x coverage of the mitochondrion, but coverage is patchy, and it seems to be difficult to close the circle (at least with SOAPdenovo). | + | |
- | * I have an almost complete mitochondrial genome, and I'm hoping that a few more iterations or some tricky assembly will close it into a clean circular genome. | + | |
* SOAPdenovo-assembly2/ Assembly with new + old Illumina and 454 data. | * SOAPdenovo-assembly2/ Assembly with new + old Illumina and 454 data. | ||
* SOAPdenovo 1.05 - can handle gzipped fastq files. | * SOAPdenovo 1.05 - can handle gzipped fastq files. | ||
Line 128: | Line 120: | ||
- scaff: | - scaff: | ||
* intra-scaffold gap closure on (-F) | * intra-scaffold gap closure on (-F) | ||
+ | * Statistics for each kmer size assembly (using illumina and 454 data, using both for contig and scaffolding): | ||
+ | * k31: | ||
+ | * 1,298,372 scaffolds from 4,814,226 contigs sum up 632,702,276bp, with average length 487, 0 gaps filled | ||
+ | * 3,611,844 scaffolds&singleton sum up 1,133,413,022bp, with average length 313 | ||
+ | * the longest is 10,340bp,scaffold N50 is 442 bp, scaffold N90 is 148 bp | ||
+ | * k47: | ||
+ | * 871,819 scaffolds from 5,306,463 contigs sum up 530,762,874bp, with average length 608, 0 gaps filled | ||
+ | * 4,203,195 scaffolds&singleton sum up 1,296,678,043bp, with average length 308 | ||
+ | * the longest is 14,750bp,scaffold N50 is 458 bp, scaffold N90 is 140 bp | ||
+ | * k63: | ||
+ | * 270,887 scaffolds from 4,022,505 contigs sum up 139,720,415bp, with average length 515, 0 gaps filled | ||
+ | * 3,710,532 scaffolds&singleton sum up 690,332,560bp, with average length 186 | ||
+ | * the longest is 14,897bp,scaffold N50 is 232 bp, scaffold N90 is 112 bp | ||
+ |