Velvet was developed by Daniel R. Zerbino and Ewan Birney.
Velvet: algorithms for de novo short read assembly using de Bruijn graphs[1]
Velvet may be downloaded free from here (GPL license).
On wikipedia: Velvet.
Daniel Zerbino's PhD Thesis on Velvet
Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. The colorspace version of velvet (_de) expects all data to be double-encoded. Mixed-space not directly supported.
Velvet has support for long-read data.
Velvet will accept sequence data from fastq input files, but does not use the quality information.
This is done by the pre-processor. The primer base from the colorspace read is removed, followed by the first color, since it was tied to the primer-base. In the case of mate-paired reads, the F3 read is reversed. Then the colors are all converted to bases for software that doesn't parse colorspace inputs. Thus double-encoded means reads encoded in colorspace, and then re-encoded as if bases in base-space.
denovo_preprocessor converts colorspace reads into double-encoded 24-base reads that can be given to velvet_de.
velveth_de colorspace version of velveth hashes reads.
velvetg_de colorspace version of velvetg creates de Bruijn graph.
denovo_postprocessor converts velvet output double-encoded to colorspace contigs.
denovo_adp - adapter program converts colorspace to base-space while reducing read errors in colorspace as much as possible.
Strategy:
For 454 long reads, this was our best result:
velveth out 31 -short 454/?.TCA.454Reads.fna -long 454/?.TCA.454Reads.fna velvetg out -exp_cov 60 -cov_cutoff 13 Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/782604 reads
The contributed (velvet/contrib/) utility VelvetOptimiser is intended to help find the critical parameters k, exp_cov, and cov_cutoff. However although it found k, it got stuck on a local maximum on coverage and failed to produce anything useful.
Wondering if homopolymer errors in 454 data could cause trouble for the DBG, I made a utility called pseudoFlow.c that takes all homopolymers longer than 6 and shortens them to 6. We know that in the range 1 to 6, 454 is accurate. In any case, the pseudoFlow version of the data did not perform better, in fact it was a little worse.
ssh campusrocks.cse.ucsc.edu cd /campusdata/BME235/programs wget http://www.ebi.ac.uk/~zerbino/velvet/velvet_0.7.62.tgz tar xfz velvet_0.7.62.tgz mv velvet_0.7.62 velvet mv velvet_0.7.62.tgz velvet/ cd velvet make make color # color versions work with solid, have _de extension # install to bin dir cp velveth velvetg velveth_de velvetg_de /campusdata/BME235/bin/