Differences

This shows you the differences between two versions of the page.

--- archive:computer_resources:assemblies [2010/04/22 07:49]
galt Added Galt's work on velvet and SOAPdenovo
+++ archive:computer_resources:assemblies [2010/04/22 12:29]
karplus Added T-shirt k-mer suggestion.
@@ Line 33: / Line 33: @@
       * Final graph has 3602 nodes and n50 of 4851, max 94854, total 1767903, using 28785664/61262410 reads
   * SOAPdenovo
-    * SOAPdenovo-assembly1/ Assembling Pog 454 long reads with SOAPdenovo.  After being simply unable to get any version of the program to read a FASTA file despite documentation examples, I finally found a utility sff2fastq that made it possible to run SOAPdenovo on Pog 454 fastq.  I have not had time to optimize parameters yet.
+    * SOAPdenovo-assembly1/ Assembling Pog 454 long reads with SOAPdenovo.  After being simply unable to get any version of the program to read a FASTA file despite documentation examples, I finally found a utility sff2fastq that made it possible to run SOAPdenovo on Pog 454 fastq.  I have not had time to optimize parameters yet.  The largest contig made with default params was just 4k.
-      * The largest contig made with default params was just 4k.
 ===== slug/ =====
-  * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2.  This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs.  From this low assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.)
+  * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2.  This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs.  From this low assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.)  Much of the assembly is low-complexity regions (repetitions of short repeats (GA)*, (TA)*, (TTC)*, (AC)*, (TAG)*, (CGAA)*, (TATC)*, (CAA)*, ... ).  The most common 14-mer that is not a repeat of a short k-mer is TAGTTTACAGCTTG (so that is what we should put on the T-shirt).

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools