User Tools

Site Tools


contributors:team_4_page

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
contributors:team_4_page [2015/05/08 02:52]
JaredC
contributors:team_4_page [2015/05/11 09:01]
emfeal [Running ABySS]
Line 20: Line 20:
  
 [[https://​github.com/​bcgsc/​abyss| ABySS manual and source code]] [[https://​github.com/​bcgsc/​abyss| ABySS manual and source code]]
 +
 +=====General notes=====
 +  - ABySS can run in serial mode, but that isn't too useful for such a large genome. ​
 +  - The documentation recommends creating assemblies with several values of k and selecting the "​best"​ one.
 +  - The program involves its own error correction.  ​
  
 =====Installing ABySS===== =====Installing ABySS=====
Line 75: Line 80:
  
 =====Running ABySS===== =====Running ABySS=====
 +
 +Running an assembly on the campusrocks cluster, which runs on SGE, requires the use of the qsub command. ​ The use of a shell script allows a convenient and concise way to wrap useful qsub options, environmental variable manipulations,​ and the executable (abyss-pe) itself in a single script. An example script which runs 
 +the parallel-version of ABySS on the cluster is shown below. ​
 +<​code>​
 +#!/bin/sh
 +#$ -N team4
 +#$ -cwd
 +#$ -j y
 +#$ -pe mpi 10
 +#$ -S /bin/bash
 +#$ -V
 +#$ -l mem_free=15g
 +ABYSSRUNDIR=/​campusdata/​BME235/​bin
 +export PATH=$PATH:/​opt/​openmpi/​bin:/​campusdata/​BME235/​bin/​
 +export LD_LIBRARY_PATH=/​opt/​openmpi/​lib/:​$LD_LIBRARY_PATH
 +ABYSSRUN=$ABYSSRUNDIR/​abyss-pe
 +$ABYSSRUN np=10 k=21 name=ecoli in='/​campusdata/​BME235/​programs/​abyss-1.5.2/​JARED/​test-data/​reads1.fastq /​campusdata/​BME235/​programs/​abyss-1.5.2/​JARED/​test-data/​reads2.fastq'​
 +</​code>​
 +Note that the parallel version of ABySS requires two things in particular: ​
 +  * **(1)** The use of a parallel environment which can be selected using a qsub option.  ​
 +  * **(2)** The //np// option of abyss-pe. The number of processes here must reflect the number included in the parallel environment option.
 +The parallel environment option in the script above:
 +<​code>​
 +#$ -pe mpi 10
 +</​code>​
 +The //mpi// designates the choice of a parallel environment that is installed on the system and the 10 indicates the number of processes over which to run the job. To see which PE's are installed on the system, use the command:
 +<​code>​
 +qconf -spl
 +</​code>​
  
 abyss-pe is a driver script implemented as a Makefile. Any option of make may be used with abyss-pe. Particularly useful options are: abyss-pe is a driver script implemented as a Makefile. Any option of make may be used with abyss-pe. Particularly useful options are:
Line 133: Line 167:
   * konnector: fill the gaps between paired-end reads by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph   * konnector: fill the gaps between paired-end reads by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph
   * abyss-bloom:​ construct reusable bloom filter files for input to Konnector ​   * abyss-bloom:​ construct reusable bloom filter files for input to Konnector ​
- 
 =====ABySS pipeline===== =====ABySS pipeline=====
  
  
 {{ :​bioinformatic_tools:​abysspipeline.png?​nolink |}} {{ :​bioinformatic_tools:​abysspipeline.png?​nolink |}}
 +
 +=====Test run=====
 +
 +This run was done using version 1.5.2. The assembly used k=59, 10 processes, and requested mem_free=15g from qsub. The assembly was done using the SW018 and SW019 libraries only. Specifically,​ the files used were:
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW018_S1_L007_R1_001_trimmed.fastq.gz
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW018_S1_L007_R2_001_trimmed.fastq.gz
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S1_L001_R1_001_trimmed.fastq.gz
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S1_L001_R2_001_trimmed.fastq.gz
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S2_L008_R1_001_trimmed.fastq.gz
 +  * /​campusdata/​BME235/​Spring2015Data/​adapter_trimming/​SeqPrep/​SW019_S2_L008_R2_001_trimmed.fastq.gz
 +
 +The files had adapters trimmed using SeqPrep (see the data pages for more details). SW019_S1 and SW019_S2 were treated as two separate libraries. ​
 +
 +The output and log files for this assembly are in /​campusdata/​BME235/​S15_assemblies/​abyss/​sidra/​test_run/​singleK.
 +
 +
 +====Results====
 +
 +Note: the N50, etc., stats only include contigs >= 500 bp (I believe the rest are discarded). ​
 +
 +There are 10.23 * 10^6 contigs. The N50 contig size is 2,669. The number of contigs of at least N50 (n:N50) is 174,507. The maximum contig size is 31,605, and the total number of bp (in contigs >= 500 bp) is 1.557 * 10^9. 
 +
 +Here are the stats summarized for the contigs and also for scaffolds and unitigs. n:500 is the number of contigs/​unitigs/​scaffolds at least as long as 500 bp. sum is the number of bases in all the contigs/​unitigs/​scaffolds at least as long as 500 bp combined. ​
 +
 +| n | n:500 | n:N50 | min | N80 | N50 | N20 | E-size | max | sum | name |
 +| 11.95e6 | 993409 | 247109 | 500 | 962  | 1795 | 3327 | 2296 | 30520 | 1.456e9 | slug-unitigs.fa |
 +| 10.23e6 | 785054 | 174507 | 500 | 1320 | 2669 | 5079 | 3433 | 31605 | 1.557e9 | slug-contigs.fa | 
 +| 10.11e6 | 711022 | 153036 | 500 | 1490 | 3063 | 5870 | 3945 | 37466 | 1.573e9 | slug-scaffolds.fa |
 +
 +
 +====Notes====
 +
 +The success of this run means we are probably ready to do a run with all the data (not including the mate-pair data, that can be used for scaffolding later). For that run, the different trimmed files for each library should be concatenated,​ so that the run involves only the actual number of libraries we had (I believe 4?). It should also use many more than 10 processes. ​
contributors/team_4_page.txt · Last modified: 2015/07/18 20:52 by 92.247.181.31