archive:bioinformatic

VELVET
References
Discussion

VELVET

Overview

Velvet was developed by Daniel R. Zerbino and Ewan Birney.

Velvet: algorithms for de novo short read assembly using de Bruijn graphs[1]

Velvet may be downloaded free from here (GPL license).

On wikipedia: Velvet.

Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. The colorspace version of velvet (_de) expects all data to be double-encoded. Mixed-space not directly supported.

Velvet has support for long-read data.

Velvet will accept sequence data from fastq input files, but does not use the quality information.

Color-Space

DE double-encoded

This is done by the pre-processor. The primer base from the colorspace read is removed, followed by the first color, since it was tied to the primer-base. In the case of mate-paired reads, the F3 read is reversed. Then the colors are all converted to bases for software that doesn't parse colorspace inputs. Thus double-encoded means reads encoded in colorspace, and then re-encoded as if bases in base-space.

colorspace programs

denovo_preprocessor converts colorspace reads into double-encoded 24-base reads that can be given to velvet_de.

velveth_de colorspace version of velveth hashes reads.

velvetg_de colorspace version of velvetg creates de Bruijn graph.

denovo_postprocessor converts velvet output double-encoded to colorspace contigs.

denovo_adp - adapter program converts colorspace to base-space while reducing read errors in colorspace as much as possible.

De-novo Tools for velvet from ABI for Solid

Running

Strategy:

Find the right value for k. For short reads remember to keep k small for good kmer coverage.
Find the right values for exp_cov and cov-cutoff. This is very important.
- velvet-estimate-exp_cov.pl out/stats.txt makes a useful graph.
If you only have long reads, use them also as your short reads.

For 454 long reads, this was our best result:

velveth out 31 -short 454/?.TCA.454Reads.fna -long 454/?.TCA.454Reads.fna
velvetg out -exp_cov 60 -cov_cutoff 13
Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/782604 reads

Failures

VelvetOptimiser

The contributed (velvet/contrib/) utility VelvetOptimiser is intended to help find the critical parameters k, exp_cov, and cov_cutoff. However although it found k, it got stuck on a local maximum on coverage and failed to produce anything useful.

pseudoFlow

Wondering if homopolymer errors in 454 data could cause trouble for the DBG, I made a utility called pseudoFlow.c that takes all homopolymers longer than 6 and shortens them to 6. We know that in the range 1 to 6, 454 is accurate. In any case, the pseudoFlow version of the data did not perform better, in fact it was a little worse.

Installing

ssh campusrocks.cse.ucsc.edu

cd /campusdata/BME235/programs
wget http://www.ebi.ac.uk/~zerbino/velvet/velvet_0.7.62.tgz
tar xfz velvet_0.7.62.tgz
mv velvet_0.7.62 velvet
mv velvet_0.7.62.tgz velvet/
cd velvet
make
make color
# color versions work with solid, have _de extension
# install to bin dir
cp velveth velvetg velveth_de velvetg_de /campusdata/BME235/bin/

References

1. ^a Daniel R. Zerbino and Ewan Birney.
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Genome Res. May 2008 18: 821-829; Published in Advance March 18, 2008,
doi:10.1101/gr.074492.107

Table of Contents