User Tools

Site Tools


contributors:team_4_page

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
contributors:team_4_page [2015/05/20 18:49]
sihussai
contributors:team_4_page [2015/07/18 20:52] (current)
92.247.181.31 ↷ Links adapted because of a move operation
Line 39: Line 39:
 % --enable-dependency-tracking\ % --enable-dependency-tracking\
 % --with-boost=/​campusdata/​BME235/​include/​boost\ % --with-boost=/​campusdata/​BME235/​include/​boost\
-% --with-mpi=/​campusdata/BME235/​include+% --with-mpi=/​opt/openmpi
-% CC=gcc-4.9.2 ​CXX=g++-4.9.2+% CC=gcc CXX=g++\ 
-% CPPFLAGS=-I/​campusdata/​BME235/​include/​sparsehash+% CPPFLAGS=-I/​campusdata/​BME235/​include/​
 </​code>​ </​code>​
 Then ABySS can be installed via the makefile Then ABySS can be installed via the makefile
Line 48: Line 48:
 % make install % make install
 </​code>​ </​code>​
 +==== Notes on Installation ====
 +  * the //​-enable-maxk//​ is by default set to 64. Always ensure the max kmer is set to the smallest possible value that accommodates your target kmers so as to minimize ABySS'​s memory footprint.
 +  * ensure that the CPPFLAGS specified directory holds the google sparsehash includes. Look for a warning in the ABySS compilation to be sure.
 +  * the boost libraries do not need to be compiled before installing.
 +  * ensure openmpi is installed with the --with-sge option (for SGE) to ensure tight integration.
  
 =====ABySS parameters===== =====ABySS parameters=====
Line 91: Line 96:
 #$ -S /bin/bash #$ -S /bin/bash
 #$ -V #$ -V
-#$ -l mem_free=15g 
 ABYSSRUNDIR=/​campusdata/​BME235/​bin ABYSSRUNDIR=/​campusdata/​BME235/​bin
 export PATH=$PATH:/​opt/​openmpi/​bin:/​campusdata/​BME235/​bin/​ export PATH=$PATH:/​opt/​openmpi/​bin:/​campusdata/​BME235/​bin/​
Line 99: Line 103:
 </​code>​ </​code>​
 Note that the parallel version of ABySS requires two things in particular: ​ Note that the parallel version of ABySS requires two things in particular: ​
-  ​* **(1)** The use of a parallel environment ​which can be selected using a qsub option. ​  +  * The use of a [[archive:​parallel_environment]] (PE) which can be selected using a qsub option. ​  
-  ​* **(2)** The //np// option of abyss-pe. The number of processes here must reflect the number included in the parallel environment option. +  * The //np// option of abyss-pe. The number of processes here must reflect the number included in the parallel environment option. 
-The parallel environment ​option in the script above:+The PE option in the script above:
 <​code>​ <​code>​
 #$ -pe mpi 10 #$ -pe mpi 10
 </​code>​ </​code>​
-The //mpi// designates the choice of a parallel environment ​that is installed on the system and the 10 indicates the number of processes over which to run the job. To see which PE's are installed on the system, use the command:+The //mpi// designates the choice of a PE that is installed on the system and the 10 indicates the number of processes over which to run the job. To see which PE's are installed on the system, use the command:
 <​code>​ <​code>​
 qconf -spl qconf -spl
 </​code>​ </​code>​
 +Selecting the proper PE is critical to ensure the success of a parallelized ABySS job.  By using the command:
 +<​code>​
 +qconf -sp PE_NAME
 +</​code>​
 +you can inspect the settings for an individual PE.  For example, using the command //qconf -sp mpi// will report the following:
 +<​code>​
 +pe_name ​           mpi
 +slots              9999
 +user_lists ​        NONE
 +xuser_lists ​       NONE
 +start_proc_args ​   /​opt/​gridengine/​mpi/​startmpi.sh $pe_hostfile
 +stop_proc_args ​    /​opt/​gridengine/​mpi/​stopmpi.sh
 +allocation_rule ​   $fill_up
 +control_slaves ​    FALSE
 +job_is_first_task ​ TRUE
 +urgency_slots ​     min
 +accounting_summary TRUE
 +</​code>​
 +For using ABySS with openmpi, there are three settings in particular which should be noted:
 +  * slots: indicates maximum number of slots that can be designated with the PE
 +  * allocation_rule:​ indicates the form of scheduling to be used to allocate slots to nodes
 +  * control_slaves:​ used to indicate a tightly managed interface
 +For all of the PE's installed on the campus cluster, the slots were all set at 9999 (virtually limitless) and so were not of terrible concern to us. However, the allocation_rule and control_slaves were critical. The initial PE tried was //mpi// however, because openmpi can achieve tight integration with the cluster'​s SGE, we switched to orte. This was found to be inadequate due to its allocation_rule of $pe_slots. This demanded that all slots requested by the job be accommodated on a single node. This proved to be fatal since the memory requirements of a run with full libraries required more memory than that possessed by the largest node (~256 GB). We then switched to mpich which offers both tight integration and an allocation_rule of $fill_up which, when used with a sufficiently large number of slots, would ensure that multiple nodes are used in a job. However, the limitation in this setting is that, depending on the relative traffic on the cluster, we would have to designate more slots than needed to ensure the use of multiple nodes. ​ To address this, we had the //denovo// PE created and added to the queue. ​ This PE offers both tight integration and an allocation_rule of $round_robin. ​ This allocation setting ensures that the slots requested are evenly distributed across available nodes. ​ Thus, selecting relatively few processes would ensure the use of multiple nodes' memory.
  
 abyss-pe is a driver script implemented as a Makefile. Any option of make may be used with abyss-pe. Particularly useful options are: abyss-pe is a driver script implemented as a Makefile. Any option of make may be used with abyss-pe. Particularly useful options are:
contributors/team_4_page.1432147740.txt.gz · Last modified: 2015/05/20 18:49 by sihussai