======Team 4 Report: ABySS====== ABySS stands for Assembly By Short Sequences. =====Assembler Overview===== * Load kmers * Find adjacent kmers * Generate de Bruijn graphs =====Workflow===== * Generate paths through the reads * Merge paths * Generate contigs * ParseAligns: Empirical fragment-size distribution - Maximum Likelihood Estimator - Use empirical paired-end size distribution =====ABySS Details===== * Distributes k-mers and deBruijn graph across a cluster * Each node announces the list of k-mers it has to nodes that hold their possible extensions * 8 bits of storage per k-mer, ACGT forward and reverse extensions * Finds paths through contigs that agree with distance estimates and then merge overlapping paths =====Installing ABySS===== * Installation of single processor version was straightforward * Difficulty installing parallelized version * Developers are active in the community and the assembler has a long history, so documentation is abundant =====Running ABySS===== Single processor version: straight forward step * Qsub * Embedded qsub * Exporting paths * Abuss-pe [parameters] parallel environment in campusrocks2 Parameters: Primary Name: name of assembly K: size of k-mer If 1 library of pe data: In = ‘reads1.fq reads2.fq’ Pipeline organized via makefile: abyss-pe Autogenerated assembly statistics Contig, scafold metrics * Does not necessarily clean up things that failed. * It is better to manually clean the file. =====Using ABySS, the plan===== * Use all libraries, after processing, but no error correction * Run the seqprep * Run adapter trimming only * Run adapter trimming plus merging ====Initial run==== * The initial run is located on Edser * K = 55, arbitrary * Did not work =====For the future===== * Get parallel versions working * Finish data analysis (kmergenie, fastqc, etc) * Do assemblies * RNA-seq rescaffolding with TransABySS * Meta-assembly