User Tools

Site Tools


archive:bioinformatic_tools:abyss

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
archive:bioinformatic_tools:abyss [2010/04/07 19:54]
galt created
archive:bioinformatic_tools:abyss [2010/05/11 03:56]
jstjohn
Line 1: Line 1:
-Genome Res. 2009 Jun;19(6):1117-23Epub 2009 Feb 27. +====== ABySS ====== 
-ABySS: ​parallel assembler for short read sequence data.+===== Overview ===== 
 +ABySS[(cite:abyss>​Jared TSimpson, Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven J.M. Jones, and İnanç Birol. ABySS: ​parallel assembler for short read sequence data. //Genome Res.// June 2009 19: 1117-1123; Published in Advance February 27, 2009, doi:​[[http://​dx.doi.org/​10.1101/​gr.089532.108|10.1101/​gr.089532.108]]. 
 +)] stands for **A**ssembly **By** **S**hort **S**equences.
  
-Simpson JTWong K, Jackman ​SD, Schein ​JE, Jones SJBirol I.+ABySS is a //de novo// parallelpaired-endshort read DNA sequence assembler. \\ 
 +The single processor version can assemble genomes of up to 100 Mbases.[(cite:​website>​[[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss]])]\\ 
 +The parallel version uses MPI and can assemble larger genomes.[(cite:​website)] \\ 
 +It was used for assembly of a transcriptome from the tumor tissue of a patient with follicular lymphoma.[(cite:​Biroletal>​Inanç Birol, Shaun D. Jackman, ​Cydney B. Nielsen, Jenny Q. Qian, Richard Varhol, Greg Stazyk, Ryan D. Morin, Yongjun Zhao, Martin Hirst, Jacqueline E. Schein, ​Doug E. Horsman, Joseph M. Connors, Randy D. Gascoyne, Marco A. Marra, and Steven J. M. Jones. De novo transcriptome assembly with ABySS. //​Bioinformatics//​ 25: 2872-2877. Advance Access published on November 12009, doi:​[[http://​dx.doi.org/​10.1093/​bioinformatics/​btp367|10.1093/​bioinformatics/​btp367]].)]
  
-Genome Sciences CentreBritish Columbia Cancer AgencyVancouverBritish Columbia V5Z 4E6Canada.+ABySS can use large kmer values greater than 31. 
 + 
 + 
 +Note that ABySS is also the recommended assembler by Illumina for large genomes. {{:​bioinformatic_tools:​abyss_technote_illumina.pdf|Illumina Technote Paper}} 
 +==== Installing ==== 
 + 
 +Get the appropriate source files to be compiled: 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs 
 +wget http://​www.bcgsc.ca/​downloads/​abyss/​abyss-1.1.2.tar.gz 
 +wget http://​www.open-mpi.org/​software/​ompi/​v1.4/​downloads/​openmpi-1.4.1.tar.gz 
 +wget http://​google-sparsehash.googlecode.com/​files/​sparsehash-1.7.tar.gz 
 +tar xfz abyss-1.1.2.tar.gz 
 +tar xfz openmpi-1.4.1.tar.gz 
 +tar xfz sparsehash-1.7.tar.gz 
 +mv abyss-1.1.2.tar.gz abyss-1.1.2/​ 
 +mv openmpi-1.4.1.tar.gz openmpi-1.4.1/​ 
 +mv sparsehash-1.7.tar.gz sparsehash-1.7/​ 
 +</​code>​ 
 + 
 +FirstOpenMPI and Google sparsehash need to be compiled and installed for ABySS. 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs/​openmpi-1.4.1 
 +./configure --prefix=/​campusdata/​BME235 
 +make 
 +make install 
 +cd /​campusdata/​BME235/​programs/​sparsehash-1.7 
 +./configure --prefix=/​campusdata/​BME235 
 +make 
 +make install 
 +</​code>​ 
 + 
 +Nexta [[http://​code.google.com/​p/​google-sparsehash/​issues/​detail?​id=55|patch]] needs to be applied so that ABySS can properly compile with support for Google sparsehash 1.7. This issue will be fixed in the next release of Google sparsehash. 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​include/​google/​sparsehash 
 +wget http://​google-sparsehash.googlecode.com/​issues/​attachment?​aid=-5666329961626930947&​name=deallocate.diff 
 +patch < deallocate.diff 
 +</​code>​ 
 + 
 +Now ABySS can be compiled with OpenMPI and Google sparsehash support. 
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs/​abyss-1.1.2 
 +./configure --prefix=/​campusdata/​BME235 CPPFLAGS=-I/​campusdata/​BME235/​include 
 +make 
 +make install 
 +</​code>​ 
 + 
 +=== Alternate Install === 
 + 
 +Attempt installing against campusdata'​s openmpi which is already configured to work with SGE. Note to force inclusion of the correct mpi.h file I specify the include path to the  
 + 
 +<​code>​ 
 +cd /​campusdata/​BME235/​programs/​abyss_tmp/​abyss-1.1.2 
 +./configure --prefix=/​campusdata/​BME235/​programs/​abyss_tmp/​ CPPFLAGS='​-I/​opt/​openmpi/​include -I/​campusdata/​BME235/​include' ​ --with-mpi=/​opt/​openmpi 
 +</​code>​ 
 + 
 +Next i qlogin into some node and run make install in parallel: 
 +<​code>​ 
 +qlogin 
 +make -j8 install 
 +</​code>​ 
 + 
 + 
 +==== Websites ==== 
 +[[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss|ABySS]] \\ 
 +[[http://​www.open-mpi.org|OpenMPI]] \\ 
 +[[http://​code.google.com/​p/​google-sparsehash|Google sparsehash]] 
 + 
 +==== Sources with Binaries and Documentation ==== 
 +[[http://​www.bcgsc.ca/​platform/​bioinfo/​software/​abyss/​releases|ABySS]] \\ 
 +[[http://​www.open-mpi.org/​software/​ompi|OpenMPI]] \\ 
 +[[http://​code.google.com/​p/​google-sparsehash/​downloads/​list|Google sparsehash]] 
 + 
 +===== Slug Assembly ===== 
 +In the directory:​ 
 +  /​campus/​BME235/​assemblies/​slug/​ABySS-assembly1 
 +I ran the following command to start the assembly process on this file in parallel MPI mode. note that the binaries for abyss were installed with open-mpi 1.4but I am using mpirun 1.3. When we re-install open-mpi 1.4 so that it has SGE supportI will re-run this with that if there are problems. Here is the command executed to start the process: 
 +  /​campus/​BME235/​assemblies/​slug/​ABySS-assembly1 
 + 
 +And here are the contents of the script I use to run everything:​ 
 + 
 +  #​!/​bin/​bash 
 +  # 
 +  #$ -cwd 
 +  #$ -j y 
 +  #$ -S /bin/bash 
 +  #$ -V 
 +  #$ -l mem_free=15g 
 +  #  
 +  /​opt/​openmpi/​bin/​mpirun -np $NSLOTS abyss-pe -j j=2 np=$NSLOTS n=8 k=25 name=slugAbyss lib='​lane1 lane2 lane3 lane5 lane6 lane7 lane8' lane1='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_1_2_all_qseq.fastq'​ lane2='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_2_2_all_qseq.fastq'​ lane3='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_3_2_all_qseq.fastq'​ lane5='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_5_2_all_qseq.fastq'​ lane6='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_6_2_all_qseq.fastq'​ lane7='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_1_all_qseq.fastq /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_7_2_all_qseq.fastq'​ lane8='/​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_1_all_qseq.fastq ​ /​campus/​BME235/​data/​slug/​Illumina/​illumina_run_1/​CeleraReads/​s_8_2_all_qseq.fastq'​ 
 + 
 +Unfortunately this command crashes. The error states that the LD_LIBRARY_PATH might need to be set to point to shared MPI libraries. Also it would probably be best to use our version of "​mpirun"​ once we get it compiled with sge support. 
 + 
 +===== References ===== 
 +<​refnotes>​notes-separator:​ none</​refnotes>​ 
 +~~REFNOTES cite~~
archive/bioinformatic_tools/abyss.txt · Last modified: 2015/07/28 06:23 by ceisenhart