User Tools

Site Tools


archive:bioinformatic_tools:abyss

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
archive:bioinformatic_tools:abyss [2010/04/07 19:54]
galt created
archive:bioinformatic_tools:abyss [2010/05/11 04:51]
jstjohn
Line 1: Line 1:
-Genome Res. 2009 Jun;19(6):1117-23Epub 2009 Feb 27. +====== ABySS ====== 
-ABySS: ​parallel assembler for short read sequence data.+===== Overview ===== 
 +ABySS[(cite:abyss>​Jared TSimpson, Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven J.M. Jones, and İnanç Birol. ABySS: ​parallel assembler for short read sequence data. //Genome Res.// June 2009 19: 1117-1123; Published in Advance February 27, 2009, doi:​[[http://​dx.doi.org/​10.1101/​gr.089532.108|10.1101/​gr.089532.108]]. 
 +)] stands for **A**ssembly **By** **S**hort **S**equences.
  
-Simpson JTWong K, Jackman ​SD, Schein ​JE, Jones SJBirol I.+ABySS is a //de novo// parallelpaired-endshort read DNA sequence assembler. \\ 
 +The single processor version can assemble genomes of up to 100 Mbases.[(cite:​website>​[[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss]])]\\ 
 +The parallel version uses MPI and can assemble larger genomes.[(cite:​website)] \\ 
 +It was used for assembly of a transcriptome from the tumor tissue of a patient with follicular lymphoma.[(cite:​Biroletal>​Inanç Birol, Shaun D. Jackman, ​Cydney B. Nielsen, Jenny Q. Qian, Richard Varhol, Greg Stazyk, Ryan D. Morin, Yongjun Zhao, Martin Hirst, Jacqueline E. Schein, ​Doug E. Horsman, Joseph M. Connors, Randy D. Gascoyne, Marco A. Marra, and Steven J. M. Jones. De novo transcriptome assembly with ABySS. //​Bioinformatics//​ 25: 2872-2877. Advance Access published on November 12009, doi:​[[http://​dx.doi.org/​10.1093/​bioinformatics/​btp367|10.1093/​bioinformatics/​btp367]].)]
  
-Genome Sciences CentreBritish Columbia Cancer AgencyVancouverBritish Columbia V5Z 4E6Canada.+ABySS can use large kmer values greater than 31. 
 + 
 + 
 +Note that ABySS is also the recommended assembler by Illumina for large genomes. {{:​bioinformatic_tools:​abyss_technote_illumina.pdf|Illumina Technote Paper}} 
 +==== Installing ==== 
 + 
 +Get the appropriate source files to be compiled: 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs 
 +wget http://​www.bcgsc.ca/​downloads/​abyss/​abyss-1.1.2.tar.gz 
 +wget http://​www.open-mpi.org/​software/​ompi/​v1.4/​downloads/​openmpi-1.4.1.tar.gz 
 +wget http://​google-sparsehash.googlecode.com/​files/​sparsehash-1.7.tar.gz 
 +tar xfz abyss-1.1.2.tar.gz 
 +tar xfz openmpi-1.4.1.tar.gz 
 +tar xfz sparsehash-1.7.tar.gz 
 +mv abyss-1.1.2.tar.gz abyss-1.1.2/​ 
 +mv openmpi-1.4.1.tar.gz openmpi-1.4.1/​ 
 +mv sparsehash-1.7.tar.gz sparsehash-1.7/​ 
 +</​code>​ 
 + 
 +FirstOpenMPI and Google sparsehash need to be compiled and installed for ABySS. 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs/​openmpi-1.4.1 
 +./configure --prefix=/​campusdata/​BME235 
 +make 
 +make install 
 +cd /​campusdata/​BME235/​programs/​sparsehash-1.7 
 +./configure --prefix=/​campusdata/​BME235 
 +make 
 +make install 
 +</​code>​ 
 + 
 +Nexta [[http://​code.google.com/​p/​google-sparsehash/​issues/​detail?​id=55|patch]] needs to be applied so that ABySS can properly compile with support for Google sparsehash 1.7. This issue will be fixed in the next release of Google sparsehash. 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​include/​google/​sparsehash 
 +wget http://​google-sparsehash.googlecode.com/​issues/​attachment?​aid=-5666329961626930947&​name=deallocate.diff 
 +patch < deallocate.diff 
 +</​code>​ 
 + 
 +Now ABySS can be compiled with OpenMPI and Google sparsehash support. 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs/​abyss-1.1.2 
 +./configure --prefix=/​campusdata/​BME235 CPPFLAGS=-I/​campusdata/​BME235/​include 
 +make 
 +make install 
 +</​code>​ 
 + 
 +=== Alternate Install === 
 + 
 +Attempt installing against campusdata'​s openmpi which is already configured to work with SGE. Note to force inclusion of the correct mpi.h file I specify the include path to the  
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs/​abyss_tmp/​abyss-1.1.2 
 +./configure --prefix=/​campusdata/​BME235/​programs/​abyss_tmp/​ CPPFLAGS='​-I/​opt/​openmpi/​include -I/​campusdata/​BME235/​include' ​ --with-mpi=/​opt/​openmpi 
 +</​code>​ 
 + 
 +Next i qlogin into some node and run make install in parallel: 
 +<​code>​ 
 +qlogin 
 +make -j8 install 
 +</​code>​ 
 + 
 + 
 +the installation crashed due to a warning (-Werror was enabled). I modified configure.ac so that it is no longer enabled: 
 +<​code>​ 
 +#​AC_SUBST(AM_CXXFLAGS'-Wall -Wextra -Werror'​) 
 +AC_SUBST(AM_CXXFLAGS'-Wall -Wextra'​) 
 +</​code>​ 
 + 
 +Next I run autoconf to work with the modified configure.ac file: 
 +<​code>​ 
 +/​campus/​BME235/​bin/​autoconf/​bin/​autoreconf 
 +/​campus/​BME235/​bin/​autoconf/​bin/​autoconf 
 +</​code>​ 
 + 
 +Finally I re-do the configure, and install: 
 +<​code>​ 
 +./configure --prefix=/​campusdata/​BME235/​programs/​abyss_tmp/​ CPPFLAGS='​-I/​opt/​openmpi/​include -I/​campusdata/​BME235/​include' ​ --with-mpi=/​opt/​openmpi 
 +make -j8 install 
 +cd ../ 
 +fixmode . & 
 +</​code>​ 
 + 
 + 
 +==== Websites ==== 
 +[[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss|ABySS]] \\ 
 +[[http://​www.open-mpi.org|OpenMPI]] \\ 
 +[[http://​code.google.com/​p/​google-sparsehash|Google sparsehash]] 
 + 
 +==== Sources with Binaries and Documentation ==== 
 +[[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss/​releases|ABySS]] \\ 
 +[[http://​www.open-mpi.org/​software/​ompi|OpenMPI]] \\ 
 +[[http://​code.google.com/​p/​google-sparsehash/​downloads/​list|Google sparsehash]] 
 + 
 +===== Slug Assembly ===== 
 + 
 +====Attempt1==== 
 +In the directory:​ 
 +  /​campus/​BME235/​assemblies/​slug/​ABySS-assembly1 
 +I ran the following command to start the assembly process on this file in parallel MPI mode. note that the binaries for abyss were installed with open-mpi 1.4, but I am using mpirun 1.3. When we re-install open-mpi 1.4 so that it has SGE support, I will re-run this with that if there are problems. Here is the command executed to start the process: 
 +  /​campus/​BME235/​assemblies/​slug/​ABySS-assembly1 
 + 
 +And here are the contents of the script I use to run everything:​ 
 + 
 +  #​!/​bin/​bash 
 +  # 
 +  #$ -cwd 
 +  #$ -j y 
 +  #$ -S /bin/bash 
 +  #$ -V 
 +  #$ -l mem_free=15g 
 +  #  
 +  /​opt/​openmpi/​bin/​mpirun -np $NSLOTS abyss-pe -j j=2 np=$NSLOTS n=8 k=25 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq'​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq ​ /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​ 
 + 
 +Unfortunately this command crashes. The error states that the LD_LIBRARY_PATH might need to be set to point to shared MPI libraries. Also it would probably be best to use our version of "​mpirun"​ once we get it compiled with sge support. 
 +====Attempt 2==== 
 +I modified the script to use the other parallel version of ABySS I installed as described above, attempted in the same directory since the last attempt was entirely unsuccessfull:​ 
 +<​code>​ 
 +/​campus/​BME235/​assemblies/​slug/​ABySS-assembly1/​run1_abyss_mpi.sh:​ 
 +#​!/​bin/​bash 
 +
 +#$ -cwd 
 +#$ -j y 
 +#$ -S /bin/bash 
 +#$ -V 
 +#$ -l mem_free=15g 
 +
 +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/​opt/​openmpi/​lib:/​campus/​BME235/​lib 
 +#make the new MPI version of abyss prioritized 
 +export PATH=/​campus/​BME235/​programs/​abyss_tmp/​bin:​$PATH/​opt/​openmpi/​bin/​mpirun -np $NSLOTS abyss-pe -j j=2 np=$NSLOTS n=8 k=25 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq'​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​I 
 +llumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq ​ /​campus/​BME235/​data/​slug/​Illumina/​illumina_ 
 +run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​ 
 +</​code>​ 
 + 
 +And I run the script using the following qsub command: 
 +<​code>​ 
 +qsub -pe orte 40 run1_abyss_mpi.sh 
 +</​code>​ 
 + 
 +FAIL ARRRG!! 
 + 
 +Out of curiosity I decided to follow the example for how to run qsub on an MPI job over sun grid engine as documented on the campusrocks page. 
 +To see the test and results look into the following directory on campusrocks:​ 
 +  /​campus/​BME235/​test 
 + 
 +I followed the example exactly and I get the following error (which is exactly the same as the one I get when trying to run ABySS!) 
 +<​code>​ 
 +error: error: ending connection before all data received 
 +error:  
 +error reading job context from "​qlogin_starter"​ 
 +-------------------------------------------------------------------------- 
 +A daemon (pid 9204) died unexpectedly with status 1 while attempting 
 +to launch so we are aborting. 
 + 
 +There may be more information reported by the environment (see above). 
 + 
 +This may be because the daemon was unable to find all the needed shared 
 +libraries on the remote node. You may set your LD_LIBRARY_PATH to have the 
 +location of the shared libraries on the remote nodes and this will 
 +automatically be forwarded to the remote nodes. 
 +-------------------------------------------------------------------------- 
 +-------------------------------------------------------------------------- 
 +mpirun noticed that the job aborted, but has no info as to the process 
 +that caused that situation. 
 +-------------------------------------------------------------------------- 
 +-------------------------------------------------------------------------- 
 +mpirun was unable to cleanly terminate the daemons on the nodes shown 
 +below. Additional manual cleanup may be required - please refer to 
 +the "​orte-clean"​ tool for assistance. 
 +-------------------------------------------------------------------------- 
 +        campusrocks-0-7.local - daemon did not report back when launched 
 +        campusrocks-0-14.local - daemon did not report back when launched 
 +</​code>​  
 + 
 +I am now submitting an IT request to have someone look into this. 
 + 
 +===== References ===== 
 +<​refnotes>​notes-separator:​ none</​refnotes>​ 
 +~~REFNOTES cite~~
archive/bioinformatic_tools/abyss.txt · Last modified: 2015/07/28 06:23 by ceisenhart