User Tools

Site Tools


archive:computer_resources:assemblies

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
archive:computer_resources:assemblies [2010/06/06 22:16]
galt adding Ray, ABySS, and PCAP on Pog from README
archive:computer_resources:assemblies [2015/09/02 16:43]
ceisenhart ↷ Page moved from computer_resources:assemblies to archive:computer_resources:assemblies
Line 19: Line 19:
     * newbler-assembly2/​ is a second de novo assembly using Newbler, starting from the cleaned reads of newbler-clean1/​sff_cleaned/​no_Hyp.sff ​ It gets 42 contigs and 2,449,409 bases.     * newbler-assembly2/​ is a second de novo assembly using Newbler, starting from the cleaned reads of newbler-clean1/​sff_cleaned/​no_Hyp.sff ​ It gets 42 contigs and 2,449,409 bases.
     * newbler-assembly3/​ starts from the same sff file as newbler-assembly2/​ but raises the expected coverage to 60 (close to actual coverage). ​ It gets 41 contigs and 2,449,426 bases, still more than the old version of Newbler got after similar cleaning. ​ The contigs have been mapped to the finished genome (using megablast, blastn, blat, and pluck-scripts/​find-dna-differences). All the contigs map cleanly to the finished genome. If contigs map to more than one place, find-dna-differences may (incorrectly) report it as not mapping.     * newbler-assembly3/​ starts from the same sff file as newbler-assembly2/​ but raises the expected coverage to 60 (close to actual coverage). ​ It gets 41 contigs and 2,449,426 bases, still more than the old version of Newbler got after similar cleaning. ​ The contigs have been mapped to the finished genome (using megablast, blastn, blat, and pluck-scripts/​find-dna-differences). All the contigs map cleanly to the finished genome. If contigs map to more than one place, find-dna-differences may (incorrectly) report it as not mapping.
-    * map-colorspace3/​ uses the [[bioinformatic_tools:​pluck-scripts|pluck-scripts]] script map-colorspace to map the SOLiD mate-pair reads onto the contigs of the newbler-assembly3/​ run.  The intent is to find what contigs join to what other ones.  The numbering starts with 3, not 1, so that the map-colorspace directories correspond to the newbler-assembly directories that they are mapping onto.+    * map-colorspace3/​ uses the [[archive:bioinformatic_tools:​pluck-scripts|pluck-scripts]] script map-colorspace to map the SOLiD mate-pair reads onto the contigs of the newbler-assembly3/​ run.  The intent is to find what contigs join to what other ones.  The numbering starts with 3, not 1, so that the map-colorspace directories correspond to the newbler-assembly directories that they are mapping onto.
     * newbler-partial3/​ assembled the partially-assembled reads of newbler-assembly3/​ to see if any extended or connected contigs. ​ Seven of the 131 new contigs could be used to extend newbler-assembly3/​ contigs, but none spanned 2 contigs.     * newbler-partial3/​ assembled the partially-assembled reads of newbler-assembly3/​ to see if any extended or connected contigs. ​ Seven of the 131 new contigs could be used to extend newbler-assembly3/​ contigs, but none spanned 2 contigs.
     * newbler-assembly4/​ starts from the same sff file as newbler-assembly2/​ and newbler-assembly3/​ but adds the contigs of newbler-partial3/​ as extra reads. ​ This did not help, getting 45 contigs and 2,449,287 bases.     * newbler-assembly4/​ starts from the same sff file as newbler-assembly2/​ and newbler-assembly3/​ but adds the contigs of newbler-partial3/​ as extra reads. ​ This did not help, getting 45 contigs and 2,449,287 bases.
Line 29: Line 29:
     * euler-sr-assembly1/​     * euler-sr-assembly1/​
   * mira   * mira
-    * mira-assembly1/+    * [[computer_resources:​assemblies:​mira:​pog:​mira-assembly1|]] 
 +    * [[computer_resources:​assemblies:​mira:​pog:​mira-assembly2|]]
   * velvet   * velvet
     * velvet-assembly1/​ Assembling Pog 454 long reads with velvet. ​ After very poor results with default settings, eventually started to get good results by getting the expected coverage (60) and cutoff (13) correct. ​ It took a long time try different parameter settings. Also using the long reads as both short and long reads gave substantially better results. ​ Because these were long reads, we could set k up to 31.  Also tried with specially compiled version of velvet that could use k > 31, but can not report any improvement yet.  Given that the average read is 370b, it should have been able to support longer k-values. ​ Best results so far:\\     * velvet-assembly1/​ Assembling Pog 454 long reads with velvet. ​ After very poor results with default settings, eventually started to get good results by getting the expected coverage (60) and cutoff (13) correct. ​ It took a long time try different parameter settings. Also using the long reads as both short and long reads gave substantially better results. ​ Because these were long reads, we could set k up to 31.  Also tried with specially compiled version of velvet that could use k > 31, but can not report any improvement yet.  Given that the average read is 370b, it should have been able to support longer k-values. ​ Best results so far:\\
Line 69: Line 70:
                         N count: mean 0.0 sd 0.2                         N count: mean 0.0 sd 0.2
                         U count: mean 11443.6 sd 65849.3                         U count: mean 11443.6 sd 65849.3
-                        Using Kevin'​s makefile, the blat alignments showed large contigs that looked basically correct.+                        Using Kevin'​s makefile, the blat alignments showed large contigs that looked basically correct, except for contig 8.
                         However many of them overlapped, unlike the Newbler output. ​ This may have been due to a                         However many of them overlapped, unlike the Newbler output. ​ This may have been due to a
                         difference in the way Newbler and PCAP tried to handle the mixed population in the sample where                         difference in the way Newbler and PCAP tried to handle the mixed population in the sample where
Line 99: Line 100:
     * so ran with filling -R to get 12k maxcontig.     * so ran with filling -R to get 12k maxcontig.
     * Then ran the scaffolding steps with 200bp insert size.     * Then ran the scaffolding steps with 200bp insert size.
-    * For all steps, used low default cutoffs since our 10x coverage +    * For all steps, used low default cutoffs since our 10x coverage is not high.  21k max scaffold size.   
-    * is not high.  21k max scaffold size.  ​Estimated +    * Estimated ​genome size is around 3G.  ​ 
-    * genome size is around 3G.  The 4 steps are +    * The 4 steps are 
-    * 1. pregraph (3.5 to 4.5 hours for 30 to 60 cpus) +      ​- ​pregraph (3.5 to 4.5 hours for 30 to 60 cpus) 
-    * 2. contig (1.3 hours) +      ​- ​contig (1.3 hours) 
-    * 3. map (0.6 hours with 60cpus) - paired ends +      ​- ​map (0.6 hours with 60cpus) - paired ends 
-    * 4. scaff (1 hour with 60cpus)+      ​- ​scaff (1 hour with 60cpus) 
 +  * barcode-of-life/​ attempt to assemble the mitochondrial genome, documented on its own page: [[computer_resources:​assemblies:​mitochondrion]]  
 +  * SOAPdenovo-assembly2/​ Assembly with new + old Illumina and 454 data. 
 +    * SOAPdenovo 1.05 - can handle gzipped fastq files. 
 +    * Runs with k27, 31, 47, and 63 so far.  47 was the best overall. ​ 63 got the longest contig (~14.9kb). 
 +    * Run parameters:​ 
 +      - pregraph: 
 +        * lowest count size of 2 (-d 2) 
 +      - contig: 
 +        * solve tiny repeats on (-R) 
 +      - map: 
 +        * all default 
 +      - scaff: 
 +        * intra-scaffold gap closure on (-F) 
 +    * Statistics for each kmer size assembly (using illumina and 454 data, using both for contig and scaffolding):​ 
 +      * k31: 
 +         * 1,298,372 scaffolds from 4,814,226 contigs sum up 632,​702,​276bp,​ with average length 487, 0 gaps filled 
 +         * 3,611,844 scaffolds&​singleton sum up 1,​133,​413,​022bp,​ with average length 313 
 +         * the longest is 10,​340bp,​scaffold N50 is 442 bp, scaffold N90 is 148 bp 
 +      * k47: 
 +         * 871,819 scaffolds from 5,306,463 contigs sum up 530,​762,​874bp,​ with average length 608, 0 gaps filled 
 +         * 4,203,195 scaffolds&​singleton sum up 1,​296,​678,​043bp,​ with average length 308 
 +         * the longest is 14,​750bp,​scaffold N50 is 458 bp, scaffold N90 is 140 bp 
 +      * k63: 
 +         * 270,887 scaffolds from 4,022,505 contigs sum up 139,​720,​415bp,​ with average length 515, 0 gaps filled 
 +         * 3,710,532 scaffolds&​singleton sum up 690,​332,​560bp,​ with average length 186 
 +         * the longest is 14,​897bp,​scaffold N50 is 232 bp, scaffold N90 is 112 bp
  
archive/computer_resources/assemblies.txt · Last modified: 2015/09/02 16:53 by 92.247.181.31