This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
archive:computer_resources:assemblies [2010/04/28 20:11] galt |
archive:computer_resources:assemblies [2010/05/19 20:45] galt Assembled Slug Genome from Illumina Paired with SOAPdenovo |
||
---|---|---|---|
Line 23: | Line 23: | ||
* newbler-assembly4/ starts from the same sff file as newbler-assembly2/ and newbler-assembly3/ but adds the contigs of newbler-partial3/ as extra reads. This did not help, getting 45 contigs and 2,449,287 bases. | * newbler-assembly4/ starts from the same sff file as newbler-assembly2/ and newbler-assembly3/ but adds the contigs of newbler-partial3/ as extra reads. This did not help, getting 45 contigs and 2,449,287 bases. | ||
* newbler-assembly5/ starts from the same sff file as newbler-assembly2,3,4 but adds 45 Sanger reads totalling 44,187 bases from PCR reactions (mainly designed to test contig-join hypotheses). It gets 31 contigs and 2,451,007 bases. | * newbler-assembly5/ starts from the same sff file as newbler-assembly2,3,4 but adds 45 Sanger reads totalling 44,187 bases from PCR reactions (mainly designed to test contig-join hypotheses). It gets 31 contigs and 2,451,007 bases. | ||
- | * map-colorspace5/ maps the SOLiD mate-pair data onto the contigs of newbler-assembly5/ Other than some problems placing contig4 and the ece insertions, we can reconstruct some pretty large chunks of the genome from the mate-pair ends. | + | * map-colorspace5/ maps the SOLiD mate-pair data onto the contigs of newbler-assembly5/ Other than some problems placing contig4 and the ece insertions, we can reconstruct some pretty large chunks of the genome from the mate-pair ends. This directory contains the trim9.joins file, which is needed for doing the **homework** to attempt to reconstruct the genome. |
* euler | * euler | ||
* euler-assembly1/ | * euler-assembly1/ | ||
Line 56: | Line 56: | ||
* The output has 2664 contigs, comprising 443,648 bases (still less than the de novo assembly). | * The output has 2664 contigs, comprising 443,648 bases (still less than the de novo assembly). | ||
* The longest contig is only 1876 bases. | * The longest contig is only 1876 bases. | ||
+ | * SOAPdenovo-assembly1/ First run of SOAPdenovo on illumina paired ends. | ||
+ | * SOAPdenovo requires fastq input files. | ||
+ | * It was used to assemble the Panda genome by BGI. | ||
+ | * Used kolossus which has 1TB and 64cpus. | ||
+ | * Ran with k=31 an k=23. k=31 was better (9k maxcontig) | ||
+ | * so ran with filling -R to get 12k maxcontig. | ||
+ | * Then ran the scaffolding steps with 200bp insert size. | ||
+ | * For all steps, used low default cutoffs since our 10x coverage | ||
+ | * is not high. 21k max scaffold size. Estimated | ||
+ | * genome size is around 3G. The 4 steps are | ||
+ | * 1. pregraph (3.5 to 4.5 hours for 30 to 60 cpus) | ||
+ | * 2. contig (1.3 hours) | ||
+ | * 3. map (0.6 hours with 60cpus) - paired ends | ||
+ | * 4. scaff (1 hour with 60cpus) | ||