Differences

This shows you the differences between two versions of the page.

--- archive:bioinformatic_tools:abyss [2010/04/07 19:54]
galt created
+++ archive:bioinformatic_tools:abyss [2010/05/11 03:56]
jstjohn
@@ Line 1: / Line 1: @@
-Genome Res. 2009 Jun;19(6):1117-23. Epub 2009 Feb 27.
+====== ABySS ======
-ABySS: a parallel assembler for short read sequence data.
+===== Overview =====
+ABySS[(cite:abyss>Jared T. Simpson, Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven J.M. Jones, and İnanç Birol. ABySS: A parallel assembler for short read sequence data. //Genome Res.// June 2009 19: 1117-1123; Published in Advance February 27, 2009, doi:[[http://dx.doi.org/10.1101/gr.089532.108|10.1101/gr.089532.108]].
+)] stands for **A**ssembly **By** **S**hort **S**equences.
-Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I.
+ABySS is a //de novo// parallel, paired-end, short read DNA sequence assembler. \\
+The single processor version can assemble genomes of up to 100 Mbases.[(cite:website>[[http://www.bcgsc.ca/platform/bioinfo/software/abyss]])]\\
+The parallel version uses MPI and can assemble larger genomes.[(cite:website)] \\
+It was used for assembly of a transcriptome from the tumor tissue of a patient with follicular lymphoma.[(cite:Biroletal>Inanç Birol, Shaun D. Jackman, Cydney B. Nielsen, Jenny Q. Qian, Richard Varhol, Greg Stazyk, Ryan D. Morin, Yongjun Zhao, Martin Hirst, Jacqueline E. Schein, Doug E. Horsman, Joseph M. Connors, Randy D. Gascoyne, Marco A. Marra, and Steven J. M. Jones. De novo transcriptome assembly with ABySS. //Bioinformatics// 25: 2872-2877. Advance Access published on November 1, 2009, doi:[[http://dx.doi.org/10.1093/bioinformatics/btp367|10.1093/bioinformatics/btp367]].)]
-Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada.
+ABySS can use large kmer values greater than 31.
+Note that ABySS is also the recommended assembler by Illumina for large genomes. {{:bioinformatic_tools:abyss_technote_illumina.pdf|Illumina Technote Paper}}
+==== Installing ====
+Get the appropriate source files to be compiled:
+<code>
+cd /campusdata/BME235/programs
+wget http://www.bcgsc.ca/downloads/abyss/abyss-1.1.2.tar.gz
+wget http://www.open-mpi.org/software/ompi/v1.4/downloads/openmpi-1.4.1.tar.gz
+wget http://google-sparsehash.googlecode.com/files/sparsehash-1.7.tar.gz
+tar xfz abyss-1.1.2.tar.gz
+tar xfz openmpi-1.4.1.tar.gz
+tar xfz sparsehash-1.7.tar.gz
+mv abyss-1.1.2.tar.gz abyss-1.1.2/
+mv openmpi-1.4.1.tar.gz openmpi-1.4.1/
+mv sparsehash-1.7.tar.gz sparsehash-1.7/
+</code>
+First, OpenMPI and Google sparsehash need to be compiled and installed for ABySS.
+<code>
+cd /campusdata/BME235/programs/openmpi-1.4.1
+./configure --prefix=/campusdata/BME235
+make
+make install
+cd /campusdata/BME235/programs/sparsehash-1.7
+./configure --prefix=/campusdata/BME235
+make
+make install
+</code>
+Next, a [[http://code.google.com/p/google-sparsehash/issues/detail?id=55|patch]] needs to be applied so that ABySS can properly compile with support for Google sparsehash 1.7. This issue will be fixed in the next release of Google sparsehash.
+<code>
+cd /campusdata/BME235/include/google/sparsehash
+wget http://google-sparsehash.googlecode.com/issues/attachment?aid=-5666329961626930947&name=deallocate.diff
+patch < deallocate.diff
+</code>
+Now ABySS can be compiled with OpenMPI and Google sparsehash support.
+<code>
+cd /campusdata/BME235/programs/abyss-1.1.2
+./configure --prefix=/campusdata/BME235 CPPFLAGS=-I/campusdata/BME235/include
+make
+make install
+</code>
+=== Alternate Install ===
+Attempt installing against campusdata's openmpi which is already configured to work with SGE. Note to force inclusion of the correct mpi.h file I specify the include path to the
+<code>
+cd /campusdata/BME235/programs/abyss_tmp/abyss-1.1.2
+./configure --prefix=/campusdata/BME235/programs/abyss_tmp/ CPPFLAGS='-I/opt/openmpi/include -I/campusdata/BME235/include'  --with-mpi=/opt/openmpi
+</code>
+Next i qlogin into some node and run make install in parallel:
+<code>
+qlogin
+make -j8 install
+</code>
+==== Websites ====
+[[http://www.bcgsc.ca/platform/bioinfo/software/abyss|ABySS]] \\
+[[http://www.open-mpi.org|OpenMPI]] \\
+[[http://code.google.com/p/google-sparsehash|Google sparsehash]]
+==== Sources with Binaries and Documentation ====
+[[http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases|ABySS]] \\
+[[http://www.open-mpi.org/software/ompi|OpenMPI]] \\
+[[http://code.google.com/p/google-sparsehash/downloads/list|Google sparsehash]]
+===== Slug Assembly =====
+In the directory:
+  /campus/BME235/assemblies/slug/ABySS-assembly1
+I ran the following command to start the assembly process on this file in parallel MPI mode. note that the binaries for abyss were installed with open-mpi 1.4, but I am using mpirun 1.3. When we re-install open-mpi 1.4 so that it has SGE support, I will re-run this with that if there are problems. Here is the command executed to start the process:
+  /campus/BME235/assemblies/slug/ABySS-assembly1
+And here are the contents of the script I use to run everything:
+  #!/bin/bash
+  #
+  #$ -cwd
+  #$ -j y
+  #$ -S /bin/bash
+  #$ -V
+  #$ -l mem_free=15g
+  #
+  /opt/openmpi/bin/mpirun -np $NSLOTS abyss-pe -j j=2 np=$NSLOTS n=8 k=25 name=slugAbyss lib='lane1 lane2 lane3 lane5 lane6 lane7 lane8' lane1='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_1_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_1_2_all_qseq.fastq' lane2='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_2_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_2_2_all_qseq.fastq' lane3='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_3_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_3_2_all_qseq.fastq' lane5='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_5_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_5_2_all_qseq.fastq' lane6='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_6_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_6_2_all_qseq.fastq' lane7='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_7_1_all_qseq.fastq /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_7_2_all_qseq.fastq' lane8='/campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_8_1_all_qseq.fastq  /campus/BME235/data/slug/Illumina/illumina_run_1/CeleraReads/s_8_2_all_qseq.fastq'
+Unfortunately this command crashes. The error states that the LD_LIBRARY_PATH might need to be set to point to shared MPI libraries. Also it would probably be best to use our version of "mpirun" once we get it compiled with sge support.
+===== References =====
+<refnotes>notes-separator: none</refnotes>
+~~REFNOTES cite~~

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools