User Tools

Site Tools


archive:bioinformatic_tools:abyss

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
archive:bioinformatic_tools:abyss [2010/05/10 23:15]
jstjohn
archive:bioinformatic_tools:abyss [2010/05/15 20:05]
jstjohn
Line 13: Line 13:
  
 Note that ABySS is also the recommended assembler by Illumina for large genomes. {{:​bioinformatic_tools:​abyss_technote_illumina.pdf|Illumina Technote Paper}} Note that ABySS is also the recommended assembler by Illumina for large genomes. {{:​bioinformatic_tools:​abyss_technote_illumina.pdf|Illumina Technote Paper}}
- 
 ==== Installing ==== ==== Installing ====
  
Line 60: Line 59:
 make install make install
 </​code>​ </​code>​
 +
 +=== Alternate Install ===
 +
 +Attempt installing against campusdata'​s openmpi which is already configured to work with SGE. Note to force inclusion of the correct mpi.h file I specify the include path to the 
 +
 +<​code>​
 +cd /​campusdata/​BME235/​programs/​abyss_tmp/​abyss-1.1.2
 +./configure --prefix=/​campusdata/​BME235/​programs/​abyss_tmp/​ CPPFLAGS='​-I/​opt/​openmpi/​include -I/​campusdata/​BME235/​include' ​ --with-mpi=/​opt/​openmpi
 +</​code>​
 +
 +Next i qlogin into some node and run make install in parallel:
 +<​code>​
 +qlogin
 +make -j8 install
 +</​code>​
 +
 +
 +the installation crashed due to a warning (-Werror was enabled). I modified configure.ac so that it is no longer enabled:
 +<​code>​
 +#​AC_SUBST(AM_CXXFLAGS,​ '-Wall -Wextra -Werror'​)
 +AC_SUBST(AM_CXXFLAGS,​ '-Wall -Wextra'​)
 +</​code>​
 +
 +Next I run autoconf to work with the modified configure.ac file:
 +<​code>​
 +/​campus/​BME235/​bin/​autoconf/​bin/​autoreconf
 +/​campus/​BME235/​bin/​autoconf/​bin/​autoconf
 +</​code>​
 +
 +Finally I re-do the configure, and install:
 +<​code>​
 +./configure --prefix=/​campusdata/​BME235/​programs/​abyss_tmp/​ CPPFLAGS='​-I/​opt/​openmpi/​include -I/​campusdata/​BME235/​include' ​ --with-mpi=/​opt/​openmpi
 +make -j8 install
 +cd ../
 +fixmode . &
 +</​code>​
 +
  
 ==== Websites ==== ==== Websites ====
Line 72: Line 108:
  
 ===== Slug Assembly ===== ===== Slug Assembly =====
 +
 +====Attempt1====
 In the directory: In the directory:
   /​campus/​BME235/​assemblies/​slug/​ABySS-assembly1   /​campus/​BME235/​assemblies/​slug/​ABySS-assembly1
Line 90: Line 128:
  
 Unfortunately this command crashes. The error states that the LD_LIBRARY_PATH might need to be set to point to shared MPI libraries. Also it would probably be best to use our version of "​mpirun"​ once we get it compiled with sge support. Unfortunately this command crashes. The error states that the LD_LIBRARY_PATH might need to be set to point to shared MPI libraries. Also it would probably be best to use our version of "​mpirun"​ once we get it compiled with sge support.
 +====Attempt 2====
 +I modified the script to use the other parallel version of ABySS I installed as described above, attempted in the same directory since the last attempt was entirely unsuccessfull:​
 +<​code>​
 +/​campus/​BME235/​assemblies/​slug/​ABySS-assembly1/​run1_abyss_mpi.sh:​
 +#!/bin/bash
 +#
 +#$ -cwd
 +#$ -j y
 +#$ -S /bin/bash
 +#$ -V
 +#$ -l mem_free=15g
 +#
 +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/​opt/​openmpi/​lib:/​campus/​BME235/​lib
 +#make the new MPI version of abyss prioritized
 +export PATH=/​campus/​BME235/​programs/​abyss_tmp/​bin:​$PATH/​opt/​openmpi/​bin/​mpirun -np $NSLOTS abyss-pe -j j=2 np=$NSLOTS n=8 k=25 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq'​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​I
 +llumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq ​ /​campus/​BME235/​data/​slug/​Illumina/​illumina_
 +run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​
 +</​code>​
 +
 +And I run the script using the following qsub command:
 +<​code>​
 +qsub -pe orte 40 run1_abyss_mpi.sh
 +</​code>​
 +
 +FAIL ARRRG!!
 +
 +Out of curiosity I decided to follow the example for how to run qsub on an MPI job over sun grid engine as documented on the campusrocks page.
 +To see the test and results look into the following directory on campusrocks:​
 +  /​campus/​jastjohn/​test
 +
 +I followed the example exactly and I get the following error (which is exactly the same as the one I get when trying to run ABySS!)
 +<​code>​
 +error: error: ending connection before all data received
 +error: ​
 +error reading job context from "​qlogin_starter"​
 +--------------------------------------------------------------------------
 +A daemon (pid 9204) died unexpectedly with status 1 while attempting
 +to launch so we are aborting.
 +
 +There may be more information reported by the environment (see above).
 +
 +This may be because the daemon was unable to find all the needed shared
 +libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
 +location of the shared libraries on the remote nodes and this will
 +automatically be forwarded to the remote nodes.
 +--------------------------------------------------------------------------
 +--------------------------------------------------------------------------
 +mpirun noticed that the job aborted, but has no info as to the process
 +that caused that situation.
 +--------------------------------------------------------------------------
 +--------------------------------------------------------------------------
 +mpirun was unable to cleanly terminate the daemons on the nodes shown
 +below. Additional manual cleanup may be required - please refer to
 +the "​orte-clean"​ tool for assistance.
 +--------------------------------------------------------------------------
 +        campusrocks-0-7.local - daemon did not report back when launched
 +        campusrocks-0-14.local - daemon did not report back when launched
 +</​code> ​
 +
 +
 +
 +I am now submitting an IT request to have someone look into this.
 +
 +====Attempt 3====
 +For this attempt I manually set up an openmpi environment and execute the program in parallel over this environment. ​
 +
 +There are three main steps to setting up an openmpi environment and executing ABySS over this environment:​
 +  - Set up ssh-key so you can log into other nodes on campusrocks over ssh without typing in a password.
 +  - Create a "​machine file" that lists all node names you wish this openmpi run to have access to
 +  - Choose a head node in your machine file list (probably campusrocks-0-6.local due to its large available memory) and issue the command to abyss-pe to get the job going.
 +
 +===SSH-key===
 +Setting up the ssh-key was the most difficult part for me to get right. I probably shouldn'​t comment on this further, other people in the class seem much more confident in setting this up so I'll let one of them fill this in.
 +
 +===Machine File===
 +An example machine file might look like this:
 +<​code>​
 +campusrocks-0-6.local
 +campusrocks-1-0.local
 +campusrocks-1-0.local
 +#​campusrocks-1-15.local
 +campusrocks-1-15.local
 +</​code>​
 +
 +The above snippet illustrates several key points about an openmpi machine file. First off each entry corresponds to one core on the respective node. Note that for the head node, I am utilizing a single core so that that core may take full advantage of the available memory. Also note that I am telling openmpi to use two cores on '​campusrocks-1-0.local'​ by listing that node twice. Finally one of the instances of '​campusrocks-1-15.local'​ is commented out in this example, this means that the line is skipped over. Commenting out lines in a machine file is a quick way to enable or disable nodes. In this way you can list all cores on all nodes in a machine file if you like, and comment out the resources that are already in use or that you won't want to use. For these high memory applications it is probably the best idea to only use one core on each node, as the memory will probably still be used up entirely.
 +
 +===Running abyss-pe===
 +abyss-pe is a makefile that handles the abyss pipeline. The fact that abyss-pe is a makefile is great because it enables you to simply re-issue the same command if your assembly crashes, and it will pick up where it left off!
 +
 +First I went to the campusrocks [[http://​campusrocks.soe.ucsc.edu/​ganglia/​|ganglia web page]] to check which nodes were free, and I modified my machine file accordingly,​ allocating one core per node I wanted to run my job on that was relatively free per ganglia. My machine file in this case is stored under the name "​machines"​ and is located in the base directory of this assembly.
 +
 +Next I added the appropriate abyss bin directory to the head of my path once I ssh'ed into my chosen openmpi head node (campusrocks-0-6.local). As of this writing the abyss installation in BME235/bin is still the non-mpi version. The mpi enabled version of abyss may be found here:
 +  /​campus/​BME235/​programs/​abyss_tmp/​bin/​
 +and I modified my path by issuing the following command:
 +  export PATH=/​campus/​BME235/​programs/​abyss_tmp/​bin:​$PATH
 +
 +Alternatively I could have simply added this into my '​.profile'​ but for now this is sufficient, especially because we are working on getting the parallel version installed into BME235/bin as a more permanent solution.
 +
 +Finally I start screen (this process will take several days to run, and it is nice to check back on the progress) and issue the command from screen. Note that there is a [[http://​www.kuro5hin.org/​story/​2004/​3/​9/​16838/​14935|nice screen tutorial]] that I use to remind me of the basic screen commands and usage.
 +
 +<​code>​
 +screen
 +
 +/​campus/​BME235/​programs/​abyss_tmp/​bin/​abyss-pe -j j=2 mpirun="/​opt/​openmpi/​bin/​mpirun -machinefile machines -x PATH=/​campus/​BME235/​bin/​programs/​abyss_tmp/​bin:​$PATH"​ np=60 n=8 k=28 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' ​ lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq' ​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​
 +</​code>​
 +
 +Note that I originally started this process on campusrocks-1-0.local with two cores allocated. The process was eventually killed either by a cluster admin, or something else. I then decided to re run the program from campusrocks-0-6.local due to its larger amount of available ram. The assembly picked up where it left off and the process that previously was killed finished within a fairly short period of time.
  
 ===== References ===== ===== References =====
 <​refnotes>​notes-separator:​ none</​refnotes>​ <​refnotes>​notes-separator:​ none</​refnotes>​
 ~~REFNOTES cite~~ ~~REFNOTES cite~~
archive/bioinformatic_tools/abyss.txt · Last modified: 2015/07/28 06:23 by ceisenhart