This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
contributors:team_5_page [2015/05/27 16:38] ceisenhart [Post assembly scaffolding] |
contributors:team_5_page [2015/08/04 08:33] 66.249.79.241 ↷ Links adapted because of a move operation |
||
---|---|---|---|
Line 10: | Line 10: | ||
Discovar //de novo// is a next generation sequence assembly program. The program was developed by the Broad Institute and was released late in 2014. Discovar //de novo// is designed for 250 bp long illumina reads with the PCR duplicates and adaptor sequences removed. The following webpage contains the manual as provided by the Broad Institute (http://www.broadinstitute.org/software/discovar/blog/): | Discovar //de novo// is a next generation sequence assembly program. The program was developed by the Broad Institute and was released late in 2014. Discovar //de novo// is designed for 250 bp long illumina reads with the PCR duplicates and adaptor sequences removed. The following webpage contains the manual as provided by the Broad Institute (http://www.broadinstitute.org/software/discovar/blog/): | ||
- | [[Discovar //de novo// manual:Discovar de novo manual]]. | + | [[team_5_page:discovar_de_novo_manual]]. |
Line 55: | Line 55: | ||
|[[team_5_page:50PerRun | 50% data]]| (Post Skewer and FastUniq) MiSeq data SW019_S1_L001, HiSeq data SW018_S1_L007, HiSeq data SW019_S2_L008 | | |[[team_5_page:50PerRun | 50% data]]| (Post Skewer and FastUniq) MiSeq data SW019_S1_L001, HiSeq data SW018_S1_L007, HiSeq data SW019_S2_L008 | | ||
|[[team_5_page:50PerRunUCSF | 50% data UCSF]]| (Post Skewer and FastUniq) UCSF SW018 and SW019 data | | |[[team_5_page:50PerRunUCSF | 50% data UCSF]]| (Post Skewer and FastUniq) UCSF SW018 and SW019 data | | ||
- | |[[team_5_page:FullRun1 | Full data run 1]] | (Post Skewer and FastUniq) 100% of the MiSeq SW019, UCSF SW019, and BS-tag datasets | | + | |[[team_5_page:FullRun1 | Full data run 1]] | (Post Skewer and FastUniq) 100% of the MiSeq SW019, UCSF SW019, and 50 % BS-tag datasets | |
+ | |[[team_5_page:KollosusFullRun | Kolossus full run]] | (Post Skewer and FastUniq) 100% of the MiSeq SW019, UCSF SW019, UCSF SW018, BS-tag, BS-MK datasets | | ||
The logs are very large, important statistics have been gathered and are compared below. | The logs are very large, important statistics have been gathered and are compared below. | ||
Note that MPL1 is an acronym for mean length of first read in pair up to first error. | Note that MPL1 is an acronym for mean length of first read in pair up to first error. | ||
- | | | 1% run | 5% run | 10 % run| 50 % run | 50% UCSF run | FullRun1 | | + | | | 1% run | 5% run | 10 % run| 50 % run | 50% UCSF run | FullRun1 | Kollosus full run | |
- | | Total runtime | 1.75 hours| 1.53 hours| 2.4 hours| 8.53 hours| 14.9 hours | 24.2 hours| | + | | Total runtime | 1.75 hours| 1.53 hours| 2.4 hours| 8.53 hours| 14.9 hours | 24.2 hours| 103 hours | |
- | | Peak memory use | 43.92 GB | 78.10 GB| 151.05 GB| 220.11 GB | 184.09 GB| 246.03 GB| | + | | Peak memory use | 43.92 GB | 78.10 GB| 151.05 GB| 220.11 GB | 184.09 GB| 246.03 GB| 583.25 GB | |
- | | Bases in 1kb+ scaffolds| 75,233 | 592,685 | 1,476,875 | 101,397,871 | 1,528,625,509 | 1,849,167,875 | | + | | Bases in 1kb+ scaffolds| 75,233 | 592,685 | 1,476,875 | 101,397,871 | 1,528,625,509 | 1,849,167,875 | 1,885,373,341| |
- | | Bases in 10kb+ scaffolds| 10,572 | 11,088 | 168,543 | 151,417 | 137,959,107 | 972,798,485 | | + | | Bases in 10kb+ scaffolds| 10,572 | 11,088 | 168,543 | 151,417 | 137,959,107 | 972,798,485 | 1,106,140,476 | |
- | | MPL1 | 2 | 2 | 3 | 7 | 156 | 169 | | + | | MPL1 | 2 | 2 | 3 | 7 | 156 | 169 | 169 | |
- | | Contig N50 | 2,622 | 2,067 | 2,563 | 1,489 | 3,979 | 9,513 | | + | | Contig N50 | 2,622 | 2,067 | 2,563 | 1,489 | 3,979 | 9,513 | 10,427 | |
- | | Scaffold N50 | 2,622 | 2,067 | 2,563 | 1,489 | 3,979 | 10,634 | | + | | Scaffold N50 | 2,622 | 2,067 | 2,563 | 1,489 | 3,979 | 10,634 | 12,549 | |
- | | Coverage | | | | | 16x | 47x | | + | | Coverage | | | | | 16x | 47x | 80X | |
====Fasta assemblies==== | ====Fasta assemblies==== | ||
Line 85: | Line 86: | ||
|50%run| 137 M | 137,695,736 | 273,653 | 1,006 | 19,658 | 1,489 | | 151,417 | | |50%run| 137 M | 137,695,736 | 273,653 | 1,006 | 19,658 | 1,489 | | 151,417 | | ||
|UCSF50%run | 1.9 G | 1,839,371,352 | 1,126,557 | 1,632| 55,757 | 3,979 | 80,721 | 137,959,107 | | |UCSF50%run | 1.9 G | 1,839,371,352 | 1,126,557 | 1,632| 55,757 | 3,979 | 80,721 | 137,959,107 | | ||
- | |firstFullRun | 2.2G | 2,245,788,654 | 1,450,447 | 1,548 | 153,999 | 10,634 | 118,545 | 972,798,485 | | + | |firstFullRun | 2.2G | 2,245,788,654 | 1,450,447 | 1,548 | 153,999 | 10,634 | 118,545 | 972,798,485 | |
+ | |Kolossus full run | 2.4G | 2,395,797,282 | 1,843,153 | 1299 | 129,831 | 12,549 | 113,978 | 1,106,140,476 | | ||
The absolute path to our latest assembly in .fasta format is; | The absolute path to our latest assembly in .fasta format is; | ||
- | /campusdata/BME235/S15_assemblies/DiscovarDeNovo/firstFullRun/discovarDeNovoAssembly.fasta | + | /campusdata/BME235/S15_assemblies/DiscovarDeNovo/KolossusAssembly/discovarDeNovoKolossusAssembly.fasta |
Looking at the 10% run, the majority of scaffolds generated are quite short (<1kb). | Looking at the 10% run, the majority of scaffolds generated are quite short (<1kb). | ||
Line 119: | Line 121: | ||
See instructions for setting up the hub here, | See instructions for setting up the hub here, | ||
- | [[banana_slug_genome_browser |Banana slug browser ]] | + | [[post-assembly_analysis:banana_slug_genome_browser|Banana slug browser ]] |