-------------------------------------------------------------------------------- Wed Jun 03 12:55:08 2015 run on kolossus, pid=15858 [Jun 2 2015 19:08:18 R52488 ] DiscovarDeNovo \ READS=sample:MK::/scratch/ceisenhart/bams/BS-MK_seqprep_dupRemo \ ved.bam+sample:tag::/scratch/ceisenhart/bams/BS_tag_noAdap_noDu \ p.bam+sample:Mi19::/scratch/ceisenhart/bams/SW019_MiSeq_adapter \ Trimmed_dupRemoved.bam+sample:UC18::/scratch/ceisenhart/bams/UC \ SF_SW018_noAdap_noDup.bam+sample:UC19::/scratch/ceisenhart/bams \ /UCSF_SW019_noAdap_noDup.bam NUM_THREADS=16 MEMORY_CHECK=True \ OUT_DIR=/hive/users/ceisenhart/06.03.2015Assembly \ MAX_MEM_GB=750 -------------------------------------------------------------------------------- SYSTEM INFO - OS: Linux :: 2.6.32-504.3.3.el6.x86_64 :: #1 SMP Wed Dec 17 01:55:02 UTC 2014 - node name: kolossus - hardware type: x86_64 - cache size: 24576 KB - cpu MHz: 2260.913 - cpu model name: Intel(R) Xeon(R) CPU X7560 @ 2.27GHz - physical memory: 1009.71 GB MEMORY CHECK (typically takes several minutes; could cause machine to become sluggish or result in this job being killed) - Apparently able to allocate 100% of nominally available memory. - Can access at least 750 GB. Wed Jun 03 12:57:05 2015: finding input files Wed Jun 03 12:57:05 2015: reading 5 files (which may take a while) Wed Jun 03 12:57:05 2015: processing /scratch/ceisenhart/bams/BS-MK_seqprep_dupRemoved.bam. Wed Jun 03 12:57:05 2015: memory in use = 0.01 GB, peak = 0.01 GB Wed Jun 03 14:01:01 2015: there are 249,569,918 reads of mean length 250 Wed Jun 03 14:01:01 2015: memory in use = 111.46 GB, peak = 109.00 GB Wed Jun 03 14:01:19 2015: reads sorted Wed Jun 03 14:01:19 2015: memory in use = 112.40 GB, peak = 110.68 GB Wed Jun 03 14:04:02 2015: data stashed in output structures Wed Jun 03 14:04:02 2015: memory in use = 68.22 GB, peak = 120.57 GB Wed Jun 03 14:04:02 2015: processing /scratch/ceisenhart/bams/BS_tag_noAdap_noDup.bam. Wed Jun 03 14:04:02 2015: memory in use = 68.22 GB, peak = 120.57 GB Wed Jun 03 14:36:30 2015: there are 135,779,818 reads of mean length 249 Wed Jun 03 14:36:30 2015: memory in use = 106.51 GB, peak = 120.57 GB Wed Jun 03 14:36:40 2015: reads sorted Wed Jun 03 14:36:40 2015: memory in use = 107.02 GB, peak = 120.57 GB Wed Jun 03 14:38:13 2015: data stashed in output structures Wed Jun 03 14:38:13 2015: memory in use = 100.67 GB, peak = 120.57 GB Wed Jun 03 14:38:13 2015: processing /scratch/ceisenhart/bams/SW019_MiSeq_adapterTrimmed_dupRemoved.bam. Wed Jun 03 14:38:13 2015: memory in use = 100.67 GB, peak = 120.57 GB Wed Jun 03 14:55:49 2015: there are 54,509,730 reads of mean length 300 Wed Jun 03 14:55:49 2015: memory in use = 117.09 GB, peak = 120.57 GB Wed Jun 03 14:55:53 2015: reads sorted Wed Jun 03 14:55:53 2015: memory in use = 117.29 GB, peak = 120.57 GB Wed Jun 03 14:56:43 2015: data stashed in output structures Wed Jun 03 14:56:43 2015: memory in use = 115.79 GB, peak = 120.98 GB Wed Jun 03 14:56:43 2015: processing /scratch/ceisenhart/bams/UCSF_SW018_noAdap_noDup.bam. Wed Jun 03 14:56:43 2015: memory in use = 115.79 GB, peak = 120.98 GB Wed Jun 03 15:21:40 2015: there are 113,862,306 reads of mean length 232 Wed Jun 03 15:21:40 2015: memory in use = 155.64 GB, peak = 152.54 GB Wed Jun 03 15:21:48 2015: reads sorted Wed Jun 03 15:21:48 2015: memory in use = 156.06 GB, peak = 152.83 GB Wed Jun 03 15:23:10 2015: data stashed in output structures Wed Jun 03 15:23:10 2015: memory in use = 148.62 GB, peak = 163.18 GB Wed Jun 03 15:23:10 2015: processing /scratch/ceisenhart/bams/UCSF_SW019_noAdap_noDup.bam. Wed Jun 03 15:23:10 2015: memory in use = 148.62 GB, peak = 163.18 GB Wed Jun 03 15:59:42 2015: there are 150,162,130 reads of mean length 249 Wed Jun 03 15:59:42 2015: memory in use = 206.37 GB, peak = 202.74 GB Wed Jun 03 15:59:53 2015: reads sorted Wed Jun 03 15:59:53 2015: memory in use = 206.93 GB, peak = 202.74 GB Wed Jun 03 16:01:41 2015: data stashed in output structures Wed Jun 03 16:01:41 2015: memory in use = 198.18 GB, peak = 222.28 GB INPUT FILES: [1,type=frag,sample=MK,lib=1,frac=1] /scratch/ceisenhart/bams/BS-MK_seqprep_dupRemoved.bam [2,type=frag,sample=tag,lib=1,frac=1] /scratch/ceisenhart/bams/BS_tag_noAdap_noDup.bam [3,type=frag,sample=Mi19,lib=1,frac=1] /scratch/ceisenhart/bams/SW019_MiSeq_adapterTrimmed_dupRemoved.bam [4,type=frag,sample=UC18,lib=1,frac=1] /scratch/ceisenhart/bams/UCSF_SW018_noAdap_noDup.bam [5,type=frag,sample=UC19,lib=1,frac=1] /scratch/ceisenhart/bams/UCSF_SW019_noAdap_noDup.bam Wed Jun 03 16:01:41 2015: found 5 samples Wed Jun 03 16:01:41 2015: starts = 0,249569918,385349736,439859466,553721772 Wed Jun 03 16:11:44 2015: using 703,883,902 reads Wed Jun 03 16:11:44 2015: data extraction complete, peak mem = 222.28 GB 3.24 hours used extracting reads Wed Jun 03 16:25:48 2015: see total physical memory of 1,084,169,011,200 bytes Wed Jun 03 16:25:48 2015: see user-imposed limit on memory of 805,306,368,000 bytes Wed Jun 03 16:25:48 2015: 4.56 bytes per read base, assuming max memory available We need 7 passes. Expect 87893563 keys per batch. Provide 98925162 keys per batch. We need 9 passes. Expect 68361660 keys per batch. Provide 85294165 keys per batch. Wed Jun 03 23:16:11 2015: back from buildReadQGraph Wed Jun 03 23:16:11 2015: memory in use = 249.86 GB, peak = 583.25 GB checksum_60 = 1500661769504106748 Wed Jun 03 23:21:36 2015: constructing places Wed Jun 03 23:23:46 2015: sorting places Wed Jun 03 23:39:21 2015: building all Thu Jun 04 00:00:20 2015: calling LongReadsToPaths Thu Jun 04 00:56:43 2015: writing Thu Jun 04 01:00:50 2015: translating paths Thu Jun 04 01:12:34 2015: final stage of path translation Thu Jun 04 02:43:59 2015: writing paths 2.67 minutes used reloading assembly Thu Jun 04 03:13:06 2015: start walking memory in use = 217,802,301,440 Thu Jun 04 03:40:40 2015: start walking memory in use = 206,994,972,672 1.04 hours used cleaning 200-mer graph 11.9 hours used in ReadQGrapher 4.05e-06 seconds used reloading reads checksum_200 = 1499603233170572930 1 peak mem usage = 583.25 GB 2.56 minutes used loading stuff 2 peak mem usage = 583.25 GB launching gap assemblies, mem usage = 189,699,788,800 Thu Jun 04 04:25:49 2015: finding unsatisfieds Thu Jun 04 04:26:44 2015: creating multiplicity map Thu Jun 04 04:26:53 2015: economizing links Thu Jun 04 04:26:54 2015: forming neighborhoods Thu Jun 04 04:28:24 2015: forming initial clusters Thu Jun 04 04:29:57 2015: start sort 5.34 seconds used sorting Thu Jun 04 04:30:04 2015: merging clusters xs.size( ) = 21565962 45.5 minutes used merging xs.size( ) = 1545255 Thu Jun 04 05:16:33 2015: start overlap-based merging Thu Jun 04 05:17:51 2015: start overlap-based merging LR.size( ) = 1327468 LR.size( ) = 665564 Thu Jun 04 05:40:49 2015: now processing 665564 blobs Thu Jun 04 05:40:49 2015: memory in use = 204.69 GB, peak = 583.25 GB .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... 3.34 days spent in local assemblies, memory in use = 228.88 GB, peak = 583.25 GB Sun Jun 07 13:44:52 2015: patch reserving space Sun Jun 07 13:44:52 2015: memory in use = 228.89 GB 35.4 seconds used patching, peak mem usage = 583.25 GB new_stuff.size( ) = 7741492 Sun Jun 07 13:50:24 2015: building hb2 3.58 minutes used in new stuff 1 test memory in use now = 211,686,506,496 Warning: HashSet initial size too small. Sun Jun 07 14:37:56 2015: back from buildBigKHBVFromReads 47.6 minutes used in new stuff 2 test peak mem usage = 583.25 GB 8 minutes used in new stuff 5 Sun Jun 07 14:51:06 2015: finding interesting reads Sun Jun 07 14:51:06 2015: memory in use = 179.70 GB, peak = 583.25 GB Sun Jun 07 15:19:25 2015: building dictionary Sun Jun 07 15:19:25 2015: memory in use = 179.87 GB, peak = 583.25 GB Sun Jun 07 15:21:35 2015: reducing Sun Jun 07 15:21:35 2015: memory in use = 382.82 GB, peak = 583.25 GB We need 1 passes. Expect 19835364 keys per batch. Provide 49588410 keys per batch. Sun Jun 07 15:26:00 2015: kmerizing Sun Jun 07 15:26:00 2015: memory in use = 403.09 GB, peak = 583.25 GB We need 1 passes. Expect 53444203 keys per batch. Provide 133610506 keys per batch. Sun Jun 07 15:31:52 2015: cleaning Sun Jun 07 15:31:52 2015: memory in use = 403.09 GB, peak = 583.25 GB Sun Jun 07 15:33:51 2015: finding uniquely aligning edges Sun Jun 07 15:33:51 2015: memory in use = 403.10 GB, peak = 583.25 GB 1.88 hours used in new phase hb.N( ) = 63606048, hb.EdgeObjectCount( ) = 48291486 107401 paths improved by rerouting Sum(invalid) = 1610553, npids = 351941951 22051 edges tamped down Sun Jun 07 17:00:54 2015: checking involution Sun Jun 07 17:00:54 2015: done WARNING: 75298 suspicious read-paths. Sum(invalid) = 296088, npids = 351941951 11865 edges tamped down Sun Jun 07 17:34:43 2015: making paths index for pull apart Sun Jun 07 17:37:11 2015: pulling apart repeats 4.85 seconds used separating paths 1 2.98 minutes used in fixing mToLeft, mToRight, and mEdgeToPathIds Sun Jun 07 17:44:25 2015: there were 65104 repeats pulled apart. Sun Jun 07 17:44:25 2015: there were 534710 read paths removed during separation. Sun Jun 07 17:46:20 2015: improving paths Sun Jun 07 18:10:36 2015: done 10902131 paths extended Sun Jun 07 18:26:51 2015: start degloop Sun Jun 07 18:26:51 2015: creating path index Sun Jun 07 18:29:43 2015: starting loop Sun Jun 07 18:34:34 2015: degloop complete Sun Jun 07 18:40:51 2015: unwinding three-edge plasmids Sun Jun 07 18:40:59 2015: removing small components Sun Jun 07 18:50:10 2015: writing a.fin files Sun Jun 07 19:09:01 2015: determining candidates Sun Jun 07 19:09:22 2015: determining candidates Sun Jun 07 19:09:39 2015: determining candidates Sun Jun 07 19:09:58 2015: determining candidates Sun Jun 07 19:10:14 2015: determining candidates CN fraction good = 0.57 Sun Jun 07 19:16:36 2015: deleting 4 gaps and adding 57 gaps to force symmetry Sun Jun 07 19:20:17 2015: done making gaps, time used = 6 minutes Sun Jun 07 19:32:19 2015: determining candidates Sun Jun 07 19:32:35 2015: determining candidates Sun Jun 07 19:32:47 2015: determining candidates Sun Jun 07 19:33:01 2015: determining candidates Sun Jun 07 19:33:14 2015: determining candidates 20.6 seconds using setting up final fasta 3.34 minutes using printing final fasta ==================================================================================== assembly has 5580699 edges of mean length 747.2 contig line N50: 10,427 scaffold line N50: 12,549 total bases in 1 kb+ scaffolds: 1,885,373,341 total bases in 10 kb+ scaffolds: 1,106,140,476 There are 703,883,902 reads of mean length 250.9 and mean base quality 31.4. MPL1 = mean length of first read in pair up to first error = 169 (normal range is 175-225 for 250 base reads) Estimated chimera rate in read pairs (including mismapping) = 0.08%. genomic read coverage, using 1 kb+ scaffolds for genome size estimate: 93.7 run started Wed Jun 03 12:55:08 2015, completed Sun Jun 07 19:50:13 2015 peak mem usage = 583.25 GB, total time = 103 hours final checksum = 638735405986598733 DiscovarDeNovo READS=sample:MK::/scratch/ceisenhart/bams/BS-MK_seqprep_dupRemoved.bam+sample:tag::/scratch/ceisenhart/bams/BS_tag_noAdap_noDup.bam+sample:Mi19::/scratch/ceisenhart/bams/SW019_MiSeq_adapterTrimmed_dupRemoved.bam+sample:UC18::/scratch/ceisenhart/bams/UCSF_SW018_noAdap_noDup.bam+sample:UC19::/scratch/ceisenhart/bams/UCSF_SW019_noAdap_noDup.bam NUM_THREADS=16 MEMORY_CHECK=True OUT_DIR=/hive/users/ceisenhart/06.03.2015Assembly MAX_MEM_GB=750 Sun Jun 07 19:50:29 2015: done