Differences

This shows you the differences between two versions of the page.

--- archive:bioinformatic_tools:abyss [2010/05/15 20:09]
jstjohn
+++ archive:bioinformatic_tools:abyss [2010/05/19 17:46]
jstjohn
@@ Line 234: / Line 234: @@
 Note that I originally started this process on campusrocks-1-0.local with two cores allocated per available compute node on campusrocks. The process was eventually killed either by a cluster admin, or something else. I then decided to re run the program from campusrocks-0-6.local due to its larger amount of available ram. The assembly picked up where it left off and the process that previously was killed finished within a fairly short period of time.
+After completely crashing campusrocks-0-6.local with my process I realized that the makefile was taking the -j j=2 command, probably ignoring the -j=2 part, and parallelizing as much as possible at each step (on each core). On my head node I was running 8 huge processes simultaniously, which probably lead to node 6 going down. I almost did the same to node 1-20 before I realized what was going on and stopped the script. I have reissued the makefile with the following command which doesn't try to pump more parallelization out of the head node:
+<code>
+/campus/BME235/programs/abyss_tmp/bin/abyss-pe mpirun="/opt/openmpi/bin/mpirun -machinefile machines -x PATH=/campus/BME235/bin/programs/abyss_tmp/bin:$PATH" np=23 n=8 k=28 name=slugAbyss lib='lane1 lane2 lane3 lane5 lane6 lane7 lane8'  lane1='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_1_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_1_2_all_qseq.fastq'  lane2='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_2_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_2_2_all_qseq.fastq' lane3='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_3_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_3_2_all_qseq.fastq' lane5='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_5_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_5_2_all_qseq.fastq' lane6='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_6_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_6_2_all_qseq.fastq' lane7='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_7_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_7_2_all_qseq.fastq' lane8='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_8_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_8_2_all_qseq.fastq'
+</code>
+Also because the makefile crashed, it didn't get a chance to clean up the output from the previous step. I had to manually delete the lane-x-3.hist files (which were all of size 0 anyway). After doing this the makefile was able to pick up where it left off and re-generate the lane-x-3.hist files.
+There is some error where the laneX.hist files are empty...
+====Attempt 4====
+I have access to kolossus which has 1.1tb of ram. I will now run the program on kolossus to see if it will assemble there...
+Step1:
+Install ABySS on kolossus. Following the exactly same process as listed above except with --prefix=/scratch/jstjohn on kolossus. The installation was straightforward and went without a hitch.
+Binaries and libraries are located here:
+  /scratch/jstjohn/bin
+  /scratch/jstjohn/lib
+Step2:
+Galt has already coppied the banana slug illumina reads to /scratch/galt/bananaSlug, I added the 454 fastq reads to that folder as well.
+Step3:
+from screen on kolossus execute the following command:
+<code>
+set path = ( /scratch/jstjohn/bin $path )
+abyss-pe -j j=4 k=35 n=2 mpirun="/scratch/jstjohn/bin/mpirun -machinefile machinefile -x PATH=/scratch/jstjohn/bin:$PATH" np=30 lib='lib1' lib1='/scratch/galt/bananaSlug/slug_1.fastq /scratch/galt/bananaSlug/slug_2.fastq' se='/scratch/galt/bananaSlug/GAZ7HUX02.fastq /scratch/galt/bananaSlug/GAZ7HUX03.fastq /scratch/galt/bananaSlug/GAZ7HUX04.fastq /scratch/galt/bananaSlug/GCLL8Y406.fastq' name=slugAbyss3
+</code>
+Note that this run combines both the illumina runs and the 454 data for banana slug. I am also experimenting with a k=35 since Galt had better luck with a kmer size of 31 using SOAPdenovo than a kmer size of 23, perhaps the trend continues into larger kmers. If this doesn't work for whatever reason, I will also try shorter and longer kmers.
+We combined all fastq files into two large files representing the two read pairs. Each of these files is approximately 50GB and contain roughly 20GB of reads. Even on kolossus I am getting some out of disk space errors in the following step:
+<code>
+KAligner   -j4 -k35 /scratch/galt/bananaSlug/slug_1.fastq /scratch/galt/bananaSlug/slug_2.fastq slugAbyss3-3.fa \
+                |ParseAligns  -k35 -h lib1-3.hist \
+                |sort -nk2,2 \
+                |gzip >lib1-3.pair.gz
+</code>
+Near the height I have observed this is eating up about 50G of ram, but the issue appears to be in available space for the sort algorithm in kolossus's /tmp/ directory. I am trying this again so I can copy down the error and send it to cluster-admin because kolossus should have around 400GB free of local HD space on top of its 1.1TB of ram. (kolossus has more ram than HD space: 1.1TB of ram vs 750GB hd)
 ===== References =====
 <refnotes>notes-separator: none</refnotes>
 ~~REFNOTES cite~~

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools