===== VELVET ===== ==== Overview==== Velvet was developed by Daniel R. Zerbino and Ewan Birney. **Velvet: algorithms for de novo short read assembly using de Bruijn graphs** [(cite:velvet>Daniel R. Zerbino and Ewan Birney.\\ Velvet: Algorithms for de novo short read assembly using de Bruijn graphs\\ Genome Res. May 2008 18: 821-829; Published in Advance March 18, 2008, \\ doi:[[http://dx.doi.org/10.1101/gr.074492.107|10.1101/gr.074492.107]] )] Velvet may be downloaded free from [[http://www.ebi.ac.uk/~zerbino/velvet/|here]] (GPL license). On wikipedia: [[wp>Velvet_(software)|Velvet]]. [[http://www.ebi.ac.uk/training/ftp/PhDtheses/Daniel_Zerbino.pdf|Daniel Zerbino's PhD Thesis on Velvet]] Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. The colorspace version of velvet (_de) expects all data to be double-encoded. Mixed-space not directly supported. Velvet has support for long-read data. Velvet will accept sequence data from fastq input files, but does not use the quality information. ==== Color-Space ==== === DE double-encoded === This is done by the pre-processor. The primer base from the colorspace read is removed, followed by the first color, since it was tied to the primer-base. In the case of mate-paired reads, the F3 read is reversed. Then the colors are all converted to bases for software that doesn't parse colorspace inputs. Thus double-encoded means reads encoded in colorspace, and then re-encoded as if bases in base-space. === colorspace programs === denovo_preprocessor converts colorspace reads into double-encoded 24-base reads that can be given to velvet_de. velveth_de colorspace version of velveth hashes reads. velvetg_de colorspace version of velvetg creates de Bruijn graph. denovo_postprocessor converts velvet output double-encoded to colorspace contigs. denovo_adp - adapter program converts colorspace to base-space while reducing read errors in colorspace as much as possible. [[http://solidsoftwaretools.com/gf/project/denovo/|De-novo Tools for velvet from ABI for Solid]] ==== Running ==== Strategy: - Find the right value for k. For short reads remember to keep k small for good kmer coverage. - Find the right values for exp_cov and cov-cutoff. This is very important. * velvet-estimate-exp_cov.pl out/stats.txt makes a useful graph. - If you only have long reads, use them also as your short reads. For 454 long reads, this was our best result: velveth out 31 -short 454/?.TCA.454Reads.fna -long 454/?.TCA.454Reads.fna velvetg out -exp_cov 60 -cov_cutoff 13 Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/782604 reads ==== Failures ==== === VelvetOptimiser === The contributed (velvet/contrib/) utility VelvetOptimiser is intended to help find the critical parameters k, exp_cov, and cov_cutoff. However although it found k, it got stuck on a local maximum on coverage and failed to produce anything useful. === pseudoFlow === Wondering if homopolymer errors in 454 data could cause trouble for the DBG, I made a utility called pseudoFlow.c that takes all homopolymers longer than 6 and shortens them to 6. We know that in the range 1 to 6, 454 is accurate. In any case, the pseudoFlow version of the data did not perform better, in fact it was a little worse. ==== Installing ==== ssh campusrocks.cse.ucsc.edu cd /campusdata/BME235/programs wget http://www.ebi.ac.uk/~zerbino/velvet/velvet_0.7.62.tgz tar xfz velvet_0.7.62.tgz mv velvet_0.7.62 velvet mv velvet_0.7.62.tgz velvet/ cd velvet make make color # color versions work with solid, have _de extension # install to bin dir cp velveth velvetg velveth_de velvetg_de /campusdata/BME235/bin/ ==== Examples ==== [[http://kevin-gattaca.blogspot.com/2009/12/de-novo-assembly-with-abi-solid-reads.html|example of using velvet with solid]] ==== Website ==== [[http://www.ebi.ac.uk/~zerbino/velvet/]] ==== Source with Binaries and Documentation ==== [[http://www.ebi.ac.uk/~zerbino/velvet/velvet_0.7.62.tgz]] ===== References ===== notes-separator: none ~~REFNOTES cite~~