User Tools

Site Tools


archive:bioinformatic_tools:abyss

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:abyss [2010/05/19 17:40]
jstjohn
archive:bioinformatic_tools:abyss [2015/07/28 06:23] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:abyss to archive:bioinformatic_tools:abyss
Line 96: Line 96:
 </​code>​ </​code>​
  
 +=== Yet Another Install (1.2.7) ===
 +
 +On 7 Jun 2011, Kevin Karplus tried installing abyss-1.2.7 from /​campusdata/​BME235/​programs/​abyss-1.2.7/​ using
 +
 +<​code>​
 +configure --prefix=/​campusdata/​BME235 \
 +        CPPFLAGS='​-I/​opt/​openmpi/​include -I/​campusdata/​BME235/​include'​ \
 +        LDFLAGS=-L/​campusdata/​BME235/​lib ​ \
 +        CC=gcc44 CXX=g++44 \
 +        --with-mpi=/​opt/​openmpi ​    
 +</​code>​
 +
 +Before this configure could work, sparsehash-1.10 was installed from /​campusdata/​BME235/​programs/​google/​sparsehash-1.10/​
  
 ==== Websites ==== ==== Websites ====
 [[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss|ABySS]] \\ [[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss|ABySS]] \\
 [[http://​www.open-mpi.org|OpenMPI]] \\ [[http://​www.open-mpi.org|OpenMPI]] \\
-[[http://​code.google.com/​p/​google-sparsehash|Google sparsehash]] +[[http://​code.google.com/​p/​google-sparsehash|Google sparsehash]] ​\\ 
 +[[http://​www.boost.org/​|Boost]]
 ==== Sources with Binaries and Documentation ==== ==== Sources with Binaries and Documentation ====
 [[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss/​releases|ABySS]] \\ [[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss/​releases|ABySS]] \\
 [[http://​www.open-mpi.org/​software/​ompi|OpenMPI]] \\ [[http://​www.open-mpi.org/​software/​ompi|OpenMPI]] \\
-[[http://​code.google.com/​p/​google-sparsehash/​downloads/​list|Google sparsehash]] +[[http://​code.google.com/​p/​google-sparsehash/​downloads/​list|Google sparsehash]] ​\\ 
 +[[http://​www.boost.org/​users/​history/​version_1_57_0.html|Boost]]
 ===== Slug Assembly ===== ===== Slug Assembly =====
  
Line 238: Line 251:
  
 <​code>​ <​code>​
-/​campus/​BME235/​programs/​abyss_tmp/​bin/​abyss-pe mpirun="/​opt/​openmpi/​bin/​mpirun -machinefile machines -x PATH=/​campus/​BME235/​bin/​programs/​abyss_tmp/​bin:​$PATH"​ np=23 n=8 k=28 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' ​ lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq' ​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​+/​campus/​BME235/​programs/​abyss_tmp/​bin/​abyss-pe mpirun="/​opt/​openmpi/​bin/​mpirun -machinefile machines -x PATH=/​campus/​BME235/​bin/​programs/​abyss_tmp/​bin:​$PATH"​ np=60 n=8 k=28 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' ​ lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq' ​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​
 </​code>​ </​code>​
  
 Also because the makefile crashed, it didn't get a chance to clean up the output from the previous step. I had to manually delete the lane-x-3.hist files (which were all of size 0 anyway). After doing this the makefile was able to pick up where it left off and re-generate the lane-x-3.hist files. Also because the makefile crashed, it didn't get a chance to clean up the output from the previous step. I had to manually delete the lane-x-3.hist files (which were all of size 0 anyway). After doing this the makefile was able to pick up where it left off and re-generate the lane-x-3.hist files.
  
-There is some error where the laneX.hist files are empty... +Campusrocks-0-6.local ​is back up so I am re-starting this task. At its peak the KAligner step (where it crashed previously when the -j option was enabled) requires quite a lot of ramI am hoping that the 30GB available on this node is sufficient.
 ====Attempt 4==== ====Attempt 4====
 I have access to kolossus which has 1.1tb of ram. I will now run the program on kolossus to see if it will assemble there... I have access to kolossus which has 1.1tb of ram. I will now run the program on kolossus to see if it will assemble there...
Line 265: Line 278:
  
  
-Note that this run combines both the illumina runs and the 454 data for banana slug. I am also experimenting with a k=35 since galt had better luck with a kmer size of 31 using SOAPdenovo than a kmer size of 23, perhaps the trend continues into larger kmers. If this doesn'​t work for whatever reason, I will also try shorter and longer kmers. ​+Note that this run combines both the illumina runs and the 454 data for banana slug. I am also experimenting with a k=35 since Galt had better luck with a kmer size of 31 using SOAPdenovo than a kmer size of 23, perhaps the trend continues into larger kmers. If this doesn'​t work for whatever reason, I will also try shorter and longer kmers. ​
  
 We combined all fastq files into two large files representing the two read pairs. Each of these files is approximately 50GB and contain roughly 20GB of reads. Even on kolossus I am getting some out of disk space errors in the following step: We combined all fastq files into two large files representing the two read pairs. Each of these files is approximately 50GB and contain roughly 20GB of reads. Even on kolossus I am getting some out of disk space errors in the following step:
Line 278: Line 291:
 Near the height I have observed this is eating up about 50G of ram, but the issue appears to be in available space for the sort algorithm in kolossus'​s /tmp/ directory. I am trying this again so I can copy down the error and send it to cluster-admin because kolossus should have around 400GB free of local HD space on top of its 1.1TB of ram. (kolossus has more ram than HD space: 1.1TB of ram vs 750GB hd) Near the height I have observed this is eating up about 50G of ram, but the issue appears to be in available space for the sort algorithm in kolossus'​s /tmp/ directory. I am trying this again so I can copy down the error and send it to cluster-admin because kolossus should have around 400GB free of local HD space on top of its 1.1TB of ram. (kolossus has more ram than HD space: 1.1TB of ram vs 750GB hd)
  
 +To get around the issue of sort running out of memory in its temp directory, I found an alternate command where you can supply your own temp directory to sort. Since there is plenty of room left on the hive I issue the following command to generate the files myself. The nice thing is that since this is a makefile, once I have done this I can simply re-start the assembler, and it will see the files I have manually generated and move on to the next step. 
 +<​code>​
 +KAligner ​  -j4 -k35 /​scratch/​galt/​bananaSlug/​slug_1.fastq /​scratch/​galt/​bananaSlug/​slug_2.fastq slugAbyss3-3.fa \
 +                |ParseAligns ​ -k35 -h lib1-3.hist \
 +                |sort -T /​hive/​users/​jstjohn/​slugAssembly/​tmp -nk2,2 \
 +                |gzip >​lib1-3.pair.gz
 +</​code>​
 +
 +This step takes a lot of time. After running for approximately 24 hours it finally finished, and then when I tried to restart the makefile I accidently executed the previous KAligner command and the lib1-3.pair.gz file was written over... For now I am going to let the Ray assembly finish on Kolossus, and then I will re-run this step. Note that since campusrocks-0-6.local is back online I am also re-trying this stage in "​Attempt 3" above. Since the run on campusrocks is split up between the 7 lanes rather than one large run, it is possible that it will work even with limited ram. One of the selling points of ABySS is that it is supposed to run on "​commodity hardware"​ so we will see if it lives up to that claim.
  
  
archive/bioinformatic_tools/abyss.1274290818.txt.gz · Last modified: 2010/05/19 17:40 by jstjohn