This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
archive:computer_resources:assemblies [2011/06/03 21:37] eyliaw [slug/] |
archive:computer_resources:assemblies [2011/06/08 15:47] karplus [slug/] added info about abyss iteration and bwa+samtools+bcftools |
||
---|---|---|---|
Line 100: | Line 100: | ||
* so ran with filling -R to get 12k maxcontig. | * so ran with filling -R to get 12k maxcontig. | ||
* Then ran the scaffolding steps with 200bp insert size. | * Then ran the scaffolding steps with 200bp insert size. | ||
- | * For all steps, used low default cutoffs since our 10x coverage | + | * For all steps, used low default cutoffs since our 10x coverage is not high. 21k max scaffold size. |
- | * is not high. 21k max scaffold size. Estimated | + | * Estimated genome size is around 3G. |
- | * genome size is around 3G. The 4 steps are | + | * The 4 steps are |
- pregraph (3.5 to 4.5 hours for 30 to 60 cpus) | - pregraph (3.5 to 4.5 hours for 30 to 60 cpus) | ||
- contig (1.3 hours) | - contig (1.3 hours) | ||
Line 117: | Line 117: | ||
* We have an almost complete mitochondrial genome, and I'm hoping that a few more iterations or some tricky assembly will close it into a clean circular genome. | * We have an almost complete mitochondrial genome, and I'm hoping that a few more iterations or some tricky assembly will close it into a clean circular genome. | ||
* It turns out that a lot of the hard hand work and iterated searching to assemble the mitochondrion was not necessary, as the SOAPdenovo-assembly2/k63_w_454_contigs/ assembly now has a 14960-long contig (not scaffold!) which is an almost-full-length mitochondrion, roughly as good as the best I've managed to assemble so far. I'll combine it with my efforts and see if I can eke out a few more bases. | * It turns out that a lot of the hard hand work and iterated searching to assemble the mitochondrion was not necessary, as the SOAPdenovo-assembly2/k63_w_454_contigs/ assembly now has a 14960-long contig (not scaffold!) which is an almost-full-length mitochondrion, roughly as good as the best I've managed to assemble so far. I'll combine it with my efforts and see if I can eke out a few more bases. | ||
+ | * Iterating mapping reads with BWA and assembling them with SOAPdenovo made some progress, but there was a gap that just wouldn't close. | ||
+ | * Switching to abyss (version 1.2.7) for the assembly of the reads made a much larger contig (15535-long after pasting on a suggestion from one abyss assembly onto another). | ||
+ | * Iterating search and abyss assembly does not lengthen the large contig. Cleaning up and calling the consensus with bwa+samtools+bcftools doesn't change things much either. There seems to be a large variation in coverage (from 20x to 2300x, with a median of 225x), so I suspect that there is a repeat region at the beginning of the current contig that may have 10 repeats in it. | ||
* SOAPdenovo-assembly2/ Assembly with new + old Illumina and 454 data. | * SOAPdenovo-assembly2/ Assembly with new + old Illumina and 454 data. | ||
* SOAPdenovo 1.05 - can handle gzipped fastq files. | * SOAPdenovo 1.05 - can handle gzipped fastq files. | ||
Line 131: | Line 134: | ||
* Statistics for each kmer size assembly (using illumina and 454 data, using both for contig and scaffolding): | * Statistics for each kmer size assembly (using illumina and 454 data, using both for contig and scaffolding): | ||
* k31: | * k31: | ||
- | 1298372 scaffolds from 4814226 contigs sum up 632702276bp, with average length 487, 0 gaps filled | + | * 1,298,372 scaffolds from 4,814,226 contigs sum up 632,702,276bp, with average length 487, 0 gaps filled |
- | 3611844 scaffolds&singleton sum up 1133413022bp, with average length 313 | + | * 3,611,844 scaffolds&singleton sum up 1,133,413,022bp, with average length 313 |
- | the longest is 10340bp,scaffold N50 is 442 bp, scaffold N90 is 148 bp | + | * the longest is 10,340bp,scaffold N50 is 442 bp, scaffold N90 is 148 bp |
* k47: | * k47: | ||
- | 871819 scaffolds from 5306463 contigs sum up 530762874bp, with average length 608, 0 gaps filled | + | * 871,819 scaffolds from 5,306,463 contigs sum up 530,762,874bp, with average length 608, 0 gaps filled |
- | 4203195 scaffolds&singleton sum up 1296678043bp, with average length 308 | + | * 4,203,195 scaffolds&singleton sum up 1,296,678,043bp, with average length 308 |
- | the longest is 14750bp,scaffold N50 is 458 bp, scaffold N90 is 140 bp | + | * the longest is 14,750bp,scaffold N50 is 458 bp, scaffold N90 is 140 bp |
* k63: | * k63: | ||
- | 270887 scaffolds from 4022505 contigs sum up 139720415bp, with average length 515, 0 gaps filled | + | * 270,887 scaffolds from 4,022,505 contigs sum up 139,720,415bp, with average length 515, 0 gaps filled |
- | 3710532 scaffolds&singleton sum up 690332560bp, with average length 186 | + | * 3,710,532 scaffolds&singleton sum up 690,332,560bp, with average length 186 |
- | the longest is 14897bp,scaffold N50 is 232 bp, scaffold N90 is 112 bp | + | * the longest is 14,897bp,scaffold N50 is 232 bp, scaffold N90 is 112 bp |