Team 4 Report: ABySS
ABySS stands for Assembly By Short Sequences.
Assembler Overview
Workflow
ABySS Details
Distributes k-mers and deBruijn graph across a cluster
Each node announces the list of k-mers it has to nodes that hold their possible extensions
8 bits of storage per k-mer, ACGT forward and reverse extensions
Finds paths through contigs that agree with distance estimates and then merge overlapping paths
Installing ABySS
Installation of single processor version was straightforward
Difficulty installing parallelized version
Developers are active in the community and the assembler has a long history, so documentation is abundant
Running ABySS
Single processor version: straight forward step
Parameters:
Primary
Name: name of assembly
K: size of k-mer
If 1 library of pe data:
In = ‘reads1.fq reads2.fq’
Pipeline organized via makefile: abyss-pe
Autogenerated assembly statistics
Contig, scafold metrics
Using ABySS, the plan
Use all libraries, after processing, but no error correction
Run the seqprep
Run adapter trimming only
Run adapter trimming plus merging
Initial run
For the future
Get parallel versions working
Finish data analysis (kmergenie, fastqc, etc)
Do assemblies
RNA-seq rescaffolding with TransABySS
Meta-assembly