User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
contributors:team_4_page [2015/05/07 22:47]
JaredC created
contributors:team_4_page [2015/05/08 22:32]
Line 1: Line 1:
-====== ABySS ======+======Team 4 | ABySS ====== 
 +**A**ssembly **By** **S**hort **S**equences - a //de novo//, parallel, paired-end sequence assembler 
 +=====Team composition===== 
 +| Name | Email |  
 +| Jared Copher | | 
 +| Emilio Feal | | 
 +| Sidra Hussain | | 
 +=====ABySS overview===== 
 +ABySS is published by Canada'​s Michael Smith Genome Sciences Centre, and was the first //de novo// assembler for large genomes recommended bu Illumina in [[http://​​Documents/​products/​technotes/​technote_denovo_assembly_ecoli.pdf|this technical note]] ​ when using only their data. The ABySS team are active members on [[https://​​t/​Abyss/​|BioStars]] where they recommend all technical questions be asked. 
 +[[http://​​platform/​bioinfo/​software/​abyss | ABySS main site]] 
 +[[http://​​content/​19/​6/​1117.full.pdf| ABySS paper]] 
 +[[https://​​bcgsc/​abyss| ABySS manual and source code]] 
 +=====General notes===== 
 +  - ABySS can run in serial mode, but that isn't too useful for such a large genome.  
 +  - The documentation recommends creating assemblies with several values of k and selecting the "​best"​ one. 
 +  - The program involves its own error correction. ​  
 +=====Installing ABySS===== 
 +ABySS source code was downloaded from Github 
 +% lftpget https://​​bcgsc/​abyss/​archive/​ 
 +ABySS needs to be configured with it's dependencies 
 +% ./​ 
 +% ./configure --prefix=/​campusdata/​BME235\ 
 +% --enable-maxk=96\ ​ #must be a multiple of 32 
 +% --enable-dependency-tracking\ 
 +% --with-boost=/​campusdata/​BME235/​include/​boost\ 
 +% --with-mpi=/​campusdata/​BME235/​include\ 
 +% CC=gcc-4.9.2 CXX=g++-4.9.2\ 
 +% CPPFLAGS=-I/​campusdata/​BME235/​include/​sparsehash 
 +Then ABySS can be installed via the makefile 
 +% make 
 +% make install 
 +=====ABySS parameters===== 
 +Parameters of the driver script, abyss-pe, and their [default value] 
 +  * a: maximum number of branches of a bubble [2] 
 +  * b: maximum length of a bubble (bp) [10000] 
 +  * c: minimum mean k-mer coverage of a unitig [sqrt(median)] 
 +  * d: allowable error of a distance estimate (bp) [6] 
 +  * e: minimum erosion k-mer coverage [sqrt(median)] 
 +  * E: minimum erosion k-mer coverage per strand [1] 
 +  * j: number of threads [2] 
 +  * k: size of k-mer (bp) [no default] 
 +  * l: minimum alignment length of a read (bp) [k] 
 +  * m: minimum overlap of two unitigs (bp) [30] 
 +  * n: minimum number of pairs required for building contigs [10] 
 +  * N: minimum number of pairs required for building scaffolds [n] 
 +  * p: minimum sequence identity of a bubble [0.9] 
 +  * q: minimum base quality [3] 
 +  * s: minimum unitig size required for building contigs (bp) [200] 
 +  * S: minimum contig size required for building scaffolds (bp) [s] 
 +  * t: minimum tip size (bp) [2k] 
 +  * v: use v=-v for verbose logging, v=-vv for extra verbose [disabled] 
 +Please see the abyss-pe manual page for more information on assembly parameters. 
 +Possibly, abyss-pe parameters can have same names as existing environment variables'​. The parameters then cannot be used until the environment variables are unset. To detect such occasions, run the command: 
 +abyss-pe env [options] 
 +Above command will report all abyss-pe parameters that are set from various origins. However it will not operate ABySS programs. 
 +=====Running ABySS===== 
 +abyss-pe is a driver script implemented as a Makefile. Any option of make may be used with abyss-pe. Particularly useful options are: 
 +-C dir, --directory=dir 
 +Change to the directory dir and store the results there. 
 +-n, --dry-run 
 +Print the commands that would be executed, but do not execute them. 
 +===Commands of abyss-pe=== 
 +  * default: Equivalent to `scaffolds scaffolds-dot stats'​. 
 +  * unitigs: Assemble unitigs. 
 +  * unitigs-dot:​ Output the unitig overlap graph. 
 +  * pe-sam: Map paired-end reads to the unitigs and output a SAM file. 
 +  * pe-bam: Map paired-end reads to the unitigs and output a BAM file. 
 +  * pe-index: Generate an index of the unitigs used by abyss-map. 
 +  * contigs: Assemble contigs. 
 +  * contigs-dot:​ Output the contig overlap graph. 
 +  * mp-sam: Map mate-pair reads to the contigs and output a SAM file. 
 +  * mp-bam: Map mate-pair reads to the contigs and output a BAM file. 
 +  * mp-index: Generate an index of the contigs used by abyss-map. 
 +  * scaffolds: Assemble scaffolds. 
 +  * scaffolds-dot:​ Output the scaffold overlap graph. 
 +  * stats: Display assembly contiguity statistics. 
 +  * clean: Remove intermediate files. 
 +  * version: Display the version of abyss-pe. 
 +  * versions: Display the versions of all programs used by abyss-pe. 
 +  * help: Display a helpful message. 
 +===Programs in pipeline=== 
 +abyss-pe uses the following programs, which must be found in your PATH: 
 +  * ABYSS: de Bruijn graph assembler 
 +  * ABYSS-P: parallel (MPI) de Bruijn graph assembler 
 +  * AdjList: find overlapping sequences 
 +  * DistanceEst:​ estimate the distance between sequences 
 +  * MergeContigs:​ merge sequences 
 +  * MergePaths: merge overlapping paths 
 +  * Overlap: find overlapping sequences using paired-end reads 
 +  * PathConsensus:​ find a consensus sequence of ambiguous paths 
 +  * PathOverlap:​ find overlapping paths 
 +  * PopBubbles: remove bubbles from the sequence overlap graph 
 +  * SimpleGraph:​ find paths through the overlap graph 
 +  * abyss-fac: calculate assembly contiguity statistics 
 +  * abyss-filtergraph:​ remove shim contigs from the overlap graph 
 +  * abyss-fixmate:​ fill the paired-end fields of SAM alignments 
 +  * abyss-map: map reads to a reference sequence (BW transform) 
 +  * abyss-scaffold:​ scaffold contigs using distance estimates 
 +  * abyss-todot:​ convert graph formats and merge graphs 
 +New to Version 1.3.5 (Mar 05, 2013) 
 +  * abyss-mergepairs:​ Merges overlapping read pairs. 
 +  * abyss-layout:​ Layout contigs using the sequence overlap graph. 
 +  * abyss-samtobreak:​ Calculate contig and scaffold contiguity and correctness metrics. 
 +New to Version 1.5.2 (Jul 10, 2014) 
 +  * konnector: fill the gaps between paired-end reads by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph 
 +  * abyss-bloom:​ construct reusable bloom filter files for input to Konnector  
 +=====ABySS pipeline===== 
 +{{ :​bioinformatic_tools:​abysspipeline.png?​nolink |}} 
 +=====Test run===== 
 +This run was done using version 1.5.2. The assembly used k=59, 10 processes, and requested mem_free=15g from qsub. The assembly was done using the SW018 and SW019 libraries only. Specifically,​ the files used were: 
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW018_S1_L007_R1_001_trimmed.fastq.gz 
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW018_S1_L007_R2_001_trimmed.fastq.gz 
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S1_L001_R1_001_trimmed.fastq.gz 
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S1_L001_R2_001_trimmed.fastq.gz 
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S2_L008_R1_001_trimmed.fastq.gz 
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S2_L008_R2_001_trimmed.fastq.gz 
 +The files had adapters trimmed using SeqPrep (see the data pages for more details). SW019_S1 and SW019_S2 were treated as two separate libraries.  
 +The output and log files for this assembly are in /​campusdata/​BME235/​S15_assemblies/​abyss/​sidra/​test_run/​singleK. 
 +Note: the N50, etc., stats only include contigs >= 500 bp (I believe the rest are discarded).  
 +There are 10.23 * 10^6 contigs. The N50 contig size is 2,669. The number of contigs of at least N50 (n:N50) is 174,507. The maximum contig size is 31,605, and the total number of bp (in contigs >= 500 bp) is 1.557 * 10^9.  
 +Here are the stats summarized for the contigs and also for scaffolds and unitigs. n:500 is the number of contigs/​unitigs/​scaffolds at least as long as 500 bp. sum is the number of bases in all the contigs/​unitigs/​scaffolds at least as long as 500 bp combined.  
 +| n | n:500 | n:N50 | min | N80 | N50 | N20 | E-size | max | sum | name | 
 +| 11.95e6 | 993409 | 247109 | 500 | 962  | 1795 | 3327 | 2296 | 30520 | 1.456e9 | slug-unitigs.fa | 
 +| 10.23e6 | 785054 | 174507 | 500 | 1320 | 2669 | 5079 | 3433 | 31605 | 1.557e9 | slug-contigs.fa |  
 +| 10.11e6 | 711022 | 153036 | 500 | 1490 | 3063 | 5870 | 3945 | 37466 | 1.573e9 | slug-scaffolds.fa | 
 +The success of this run means we are probably ready to do a run with all the data (not including the mate-pair data, that can be used for scaffolding later). For that run, the different trimmed files for each library should be concatenated,​ so that the run involves only the actual number of libraries we had (I believe 4?). It should also use many more than 10 processes. ​
contributors/team_4_page.txt · Last modified: 2015/07/18 20:52 by