User Tools

Site Tools


archive:computer_resources:assemblies

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
archive:computer_resources:assemblies [2011/06/08 05:59]
karplus [slug/] fixed bullets in SOAPdenovo assembly 1
archive:computer_resources:assemblies [2015/09/02 16:43]
ceisenhart ↷ Page moved from computer_resources:assemblies to archive:computer_resources:assemblies
Line 19: Line 19:
     * newbler-assembly2/​ is a second de novo assembly using Newbler, starting from the cleaned reads of newbler-clean1/​sff_cleaned/​no_Hyp.sff ​ It gets 42 contigs and 2,449,409 bases.     * newbler-assembly2/​ is a second de novo assembly using Newbler, starting from the cleaned reads of newbler-clean1/​sff_cleaned/​no_Hyp.sff ​ It gets 42 contigs and 2,449,409 bases.
     * newbler-assembly3/​ starts from the same sff file as newbler-assembly2/​ but raises the expected coverage to 60 (close to actual coverage). ​ It gets 41 contigs and 2,449,426 bases, still more than the old version of Newbler got after similar cleaning. ​ The contigs have been mapped to the finished genome (using megablast, blastn, blat, and pluck-scripts/​find-dna-differences). All the contigs map cleanly to the finished genome. If contigs map to more than one place, find-dna-differences may (incorrectly) report it as not mapping.     * newbler-assembly3/​ starts from the same sff file as newbler-assembly2/​ but raises the expected coverage to 60 (close to actual coverage). ​ It gets 41 contigs and 2,449,426 bases, still more than the old version of Newbler got after similar cleaning. ​ The contigs have been mapped to the finished genome (using megablast, blastn, blat, and pluck-scripts/​find-dna-differences). All the contigs map cleanly to the finished genome. If contigs map to more than one place, find-dna-differences may (incorrectly) report it as not mapping.
-    * map-colorspace3/​ uses the [[bioinformatic_tools:​pluck-scripts|pluck-scripts]] script map-colorspace to map the SOLiD mate-pair reads onto the contigs of the newbler-assembly3/​ run.  The intent is to find what contigs join to what other ones.  The numbering starts with 3, not 1, so that the map-colorspace directories correspond to the newbler-assembly directories that they are mapping onto.+    * map-colorspace3/​ uses the [[archive:bioinformatic_tools:​pluck-scripts|pluck-scripts]] script map-colorspace to map the SOLiD mate-pair reads onto the contigs of the newbler-assembly3/​ run.  The intent is to find what contigs join to what other ones.  The numbering starts with 3, not 1, so that the map-colorspace directories correspond to the newbler-assembly directories that they are mapping onto.
     * newbler-partial3/​ assembled the partially-assembled reads of newbler-assembly3/​ to see if any extended or connected contigs. ​ Seven of the 131 new contigs could be used to extend newbler-assembly3/​ contigs, but none spanned 2 contigs.     * newbler-partial3/​ assembled the partially-assembled reads of newbler-assembly3/​ to see if any extended or connected contigs. ​ Seven of the 131 new contigs could be used to extend newbler-assembly3/​ contigs, but none spanned 2 contigs.
     * newbler-assembly4/​ starts from the same sff file as newbler-assembly2/​ and newbler-assembly3/​ but adds the contigs of newbler-partial3/​ as extra reads. ​ This did not help, getting 45 contigs and 2,449,287 bases.     * newbler-assembly4/​ starts from the same sff file as newbler-assembly2/​ and newbler-assembly3/​ but adds the contigs of newbler-partial3/​ as extra reads. ​ This did not help, getting 45 contigs and 2,449,287 bases.
Line 107: Line 107:
       - map (0.6 hours with 60cpus) - paired ends       - map (0.6 hours with 60cpus) - paired ends
       - scaff (1 hour with 60cpus)       - scaff (1 hour with 60cpus)
-  * barcode-of-life/​ attempt to assemble the mitochondrial genome, ​with particular emphasis ​on the gene for mitochondrial cytochrome c oxidase subunit I protein I (CO1), which is used for the "​barcode of life"​. ​[[http://​www.boldsystems.org/​|BOLD (barcode of life database)]] +  * barcode-of-life/​ attempt to assemble the mitochondrial genome, ​documented ​on its own page: [[computer_resources:assemblies:​mitochondrion]] 
-      * Started with a search of SOAPdenovo-assembly1/​k31/​soapSlug.scafSeq for scaffolds that matched examples from other mollusks. +
-      * Looked for 454 reads that extended or joined contigs in scaffold +
-      * Repeated (sometimes using more sensitive searches) until no more credible scaffolds from the SOAPdenovo-assembly1/​k31/​ assembly nor 454 reads were found. +
-      * The 454 coverage of the mitochondrion is so slight as to be nearly useless, so instead we can iterate: +
-        - find all Illumina reads that map to the mitochondrial draft, using BWA +
-        - assemble them using SOAPdenovo. +
-      * It looks like the Illumina reads have about 228x coverage of the mitochondrion,​ but coverage is patchy, and it seems to be difficult to close the circle (at least with SOAPdenovo). ​  +
-      * We have an almost complete mitochondrial genome, and I'm hoping that a few more iterations or some tricky assembly will close it into a clean circular genome. +
-      * It turns out that a lot of the hard hand work and iterated searching to assemble the mitochondrion was not necessary, as the SOAPdenovo-assembly2/​k63_w_454_contigs/​ assembly now has a 14960-long contig (not scaffold!) which is an almost-full-length mitochondrion,​ roughly as good as the best I've managed to assemble so far.  I'll combine it with my efforts and see if I can eke out a few more bases. +
-      * Iterating mapping reads with BWA and assembling them with SOAPdenovo made some progress, but there was a gap that just wouldn'​t close. +
-      * Switching to abyss (version 1.2.7) for the assembly of the reads made a much larger contig (15535-long after pasting on a suggestion from one abyss assembly onto another).+
   * SOAPdenovo-assembly2/​ Assembly with new + old Illumina and 454 data.   * SOAPdenovo-assembly2/​ Assembly with new + old Illumina and 454 data.
     * SOAPdenovo 1.05 - can handle gzipped fastq files.     * SOAPdenovo 1.05 - can handle gzipped fastq files.
archive/computer_resources/assemblies.txt · Last modified: 2015/09/02 16:53 by 92.247.181.31