This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
archive:computer_resources:assemblies [2010/04/22 13:43] karplus moved velvet-assembly1 to test |
archive:computer_resources:assemblies [2010/04/22 20:27] karplus added largest contig size for newbler-assembly1 |
||
---|---|---|---|
Line 42: | Line 42: | ||
===== slug/ ===== | ===== slug/ ===== | ||
- | * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2. This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs. From this low assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.) Much of the assembly is low-complexity regions (repetitions of short repeats (AT)*, (AAG)*, (AG)*, (AC)*, (AGT)*, (AGAT)*, (ACAT)*, (AAC)*, (AACG)*, ... ). The most common 14-mer that is not a repeat of a short k-mer is TAGTTTACAGCTTG (so that is what we should put on the T-shirt). | + | * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2. |
+ | * This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs. | ||
+ | * The longest contig is 5783 bases. | ||
+ | * From the total number of bases in the assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.) | ||
+ | * Much of the assembly is low-complexity regions (repetitions of short repeats (AT)*, (AAG)*, (AG)*, (AC)*, (AGT)*, (AGAT)*, (ACAT)*, (AAC)*, (AACG)*, ... ). | ||
+ | * The most common 14-mer that is not a repeat of a short k-mer is TAGTTTACAGCTTG (so that is what we should put on the T-shirt). | ||
+ | * newbler-mapping1-lottia/ tries to do a reference-based assembly with the //Lottia gigantea// genome as a reference. | ||
+ | * The reference has 4475 contigs with 359,512,207 bases. | ||
+ | * The output has 183 contigs with 29,389 bases. | ||
+ | * The longest contig is only 644 bases---way too small to be of much use. | ||
+ | * newbler-mapping2-seahare/ tries to do a reference-based assembly with the //Aplysia californica// genome as a reference. | ||
+ | * The sea hare reference has 8767 contigs, comprising 715,806,041 bases. | ||
+ | * The output has 2664 contigs, comprising 443,648 bases (still less than the de novo assembly). | ||
+ | * The longest contig is only 1876 bases. |