This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
archive:computer_resources:assemblies [2010/04/22 07:49] galt Added Galt's work on velvet and SOAPdenovo |
archive:computer_resources:assemblies [2010/04/22 12:29] karplus Added T-shirt k-mer suggestion. |
||
---|---|---|---|
Line 33: | Line 33: | ||
* Final graph has 3602 nodes and n50 of 4851, max 94854, total 1767903, using 28785664/61262410 reads | * Final graph has 3602 nodes and n50 of 4851, max 94854, total 1767903, using 28785664/61262410 reads | ||
* SOAPdenovo | * SOAPdenovo | ||
- | * SOAPdenovo-assembly1/ Assembling Pog 454 long reads with SOAPdenovo. After being simply unable to get any version of the program to read a FASTA file despite documentation examples, I finally found a utility sff2fastq that made it possible to run SOAPdenovo on Pog 454 fastq. I have not had time to optimize parameters yet. | + | * SOAPdenovo-assembly1/ Assembling Pog 454 long reads with SOAPdenovo. After being simply unable to get any version of the program to read a FASTA file despite documentation examples, I finally found a utility sff2fastq that made it possible to run SOAPdenovo on Pog 454 fastq. I have not had time to optimize parameters yet. The largest contig made with default params was just 4k. |
- | * The largest contig made with default params was just 4k. | + | |
===== slug/ ===== | ===== slug/ ===== | ||
- | * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2. This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs. From this low assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.) | + | * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2. This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs. From this low assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.) Much of the assembly is low-complexity regions (repetitions of short repeats (GA)*, (TA)*, (TTC)*, (AC)*, (TAG)*, (CGAA)*, (TATC)*, (CAA)*, ... ). The most common 14-mer that is not a repeat of a short k-mer is TAGTTTACAGCTTG (so that is what we should put on the T-shirt). |