User Tools

Site Tools


archive:computer_resources:assemblies

This is an old revision of the document!


A PCRE internal error occured. This might be caused by a faulty plugin

====== assemblies/ ====== This directory has a subdirectory for each organism. ===== Pog/ ===== //Pyrobaculum oguniense// assemblies * Newbler (plus map-colorspace) * newbler-assembly1/ is an attempt to do a de novo assembly using the 454 tools (Newbler) version 2.3, starting with the entire set of reads (including any contaminants). This resulted in 43 contigs and 2449932 bases. * newbler-clean1/ does not create an assembly, instead it is an attempt to remove contaminant reads from the Pog 454 data, by removing reads that map to //Helicobacter pylori// genomes. The results are in newbler-clean1/sff_cleaned/no_Hyp.sff * newbler-assembly2/ is a second de novo assembly using Newbler, starting from the cleaned reads of newbler-clean1/sff_cleaned/no_Hyp.sff It gets 42 contigs and 2,449,409 bases. * newbler-assembly3/ starts from the same sff file as newbler-assembly2/ but raises the expected coverage to 60 (close to actual coverage). It gets 41 contigs and 2,449,426 bases, still more than the old version of Newbler got after similar cleaning. The contigs have been mapped to the finished genome (using megablast, blastn, blat, and pluck-scripts/find-dna-differences). All the contigs map cleanly to the finished genome. If contigs map to more than one place, find-dna-differences may (incorrectly) report it as not mapping. * map-colorspace3/ uses the [[bioinformatic_tools:pluck-scripts|pluck-scripts]] script map-colorspace to map the SOLiD mate-pair reads onto the contigs of the newbler-assembly3/ run. The intent is to find what contigs join to what other ones. The numbering starts with 3, not 1, so that the map-colorspace directories correspond to the newbler-assembly directories that they are mapping onto. * newbler-partial3/ assembled the partially-assembled reads of newbler-assembly3/ to see if any extended or connected contigs. Seven of the 131 new contigs could be used to extend newbler-assembly3/ contigs, but none spanned 2 contigs. * newbler-assembly4/ starts from the same sff file as newbler-assembly2/ and newbler-assembly3/ but adds the contigs of newbler-partial3/ as extra reads. This did not help, getting 45 contigs and 2,449,287 bases. * newbler-assembly5/ starts from the same sff file as newbler-assembly2,3,4 but adds 45 Sanger reads totalling 44,187 bases from PCR reactions (mainly designed to test contig-join hypotheses). It gets 31 contigs and 2,451,007 bases. * map-colorspace5/ maps the SOLiD mate-pair data onto the contigs of newbler-assembly5/ Other than some problems placing contig4 and the ece insertions, we can reconstruct some pretty large chunks of the genome from the mate-pair ends. * euler * euler-assembly1/ * euler-sr * euler-sr-assembly1/ * mira * mira-assembly1/ ===== slug/ ===== * newbler-assembly1/ first attempt at de novo assembly using Newbler, using all the reads from 454_run1 and 454_run2. This assembly of 499,873 reads including 138,351,643 bases produced only 2,910,773 bases assembled into 8,963 contigs. From this low assembly number, I estimate the coverage to be about 0.043x and the genome size to be about 3.2E9 basepairs. (See the README file for the calculation.)

You could leave a comment if you were logged in.
archive/computer_resources/assemblies.1271901980.txt.gz · Last modified: 2010/04/22 02:06 by karplus