====== SOAPdenovo2 Update ====== ====Second pass at adapter trimming==== * Trimming was performed using Skewer on the initial datasets SW019_S1/S2 and SW018_S1, all the read sets that were sequenced at UCSC. * Used fastqc to confirm adapter presence in reads, primarily in SW018 reads, overrepresented sequence matched adapter sequence. * Effective adapter trimming confirmed with another fastqc run. * Compared Skewer results to results from SeqPrep (without merging) and found nearly identical results ====Second pass Musket error correction==== * Used the second pass adapter-trimmed data, and a kmer size of 31 * Observed no big differences in kmer spectra compared to previous Musket run ====Assembly runs==== ===Runtime stats=== * Did two runs, both with the adapter-trimmed, error-corrected reads described above. First run mis-specified the orientation of one library (reads were specified as <-- --> instead of --> <--), second run corrected this error in the config file. * Split into four steps, sparse pregraph construction, contig generation, mapping, and scaffold generation. * Sparse pregraph: 37.8 CPU hours and 60 GB max virtual memory for 1st run || 35.4 CPU hours and 60 GB max virt memory for 2nd run * Contig generation: 0.23 CPU hours and 5.8 GB max virt memory for 1st run || 0.24 CPU hours and 5.8 GB max virt memory for 2nd run * Mapping: 21.9 CPU hours and 66 GB max virt memory for 1st run || 14.8 CPU hours and 78 GB max virt memory for 2nd run * Scaffold generation: 92.7 CPU hours and 20 GB max virt memory for 1st run || Still in progress for 2nd run ===Results from run 1=== ==Contigs:== * Total contig sequence size : 2,051,251,797 * Contig count : 3,854,379 * Mean length : 532 * Longest sequence : 22,512 * N50 : 1,425 ; count : 389,550 * length > 1K : 583,671 (15.14%) * length > 10K : 891 (0.02%) ==Scaffolds:== * Total assembly size (including ‘N’s) : 2,064,665,199 * Total assembly size (without ‘N’s) : 1,974,393,478 * Scaffold count : 2,030,303 * Mean length : 1,016 * Longest sequence : 60,333 * N50 : 5,554 ; count : 105,217 * length > 1K : 381,668 (18.80%) * length > 10K : 35,884 (1.77%)