User Tools

Site Tools


archive:bioinformatic_tools:velvet

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
archive:bioinformatic_tools:velvet [2010/04/07 19:48]
galt created
archive:bioinformatic_tools:velvet [2015/07/28 06:27] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:velvet to archive:bioinformatic_tools:velvet
Line 1: Line 1:
 ===== VELVET ===== ===== VELVET =====
-====High Level Overview==== 
-Velvet was developed by Ewan Birney and Daniel R. Zerbino for de-novo assembly of short-reads using de Bruijn graphs. 
  
-Zerbino D, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. (2008) 18:​821–829. +==== Overview==== 
-[[http://​nar.oxfordjournals.org/​cgi/​ijlink?​linkType=ABST&​journalCode=genome&​resid=18/​5/​821|Free full text]]+Velvet was developed by Daniel R. Zerbino and Ewan Birney.
  
-Velvet ​may be downloaded free from [[http://www.ebi.ac.uk/​~zerbino/​velvet/|here]](GPL license)+**Velvet: algorithms for de novo short read assembly using de Bruijn graphs**  
 +[(cite:​velvet>​Daniel R. Zerbino and Ewan Birney.\\ 
 +Velvet: Algorithms for de novo short read assembly using de Bruijn graphs\\ 
 +Genome Res. May 2008 18: 821-829; Published in Advance March 18, 2008, \\ 
 +doi:[[http://dx.doi.org/10.1101/gr.074492.107|10.1101/​gr.074492.107]] 
 +)]
  
-There is a [[http://en.wikipedia.org/wiki/Velvet_(software)|wiki article]] about velvet.+Velvet may be downloaded free from [[http://www.ebi.ac.uk/~zerbino/​velvet/|here]] (GPL license).
  
 +On wikipedia: [[wp>​Velvet_(software)|Velvet]].
  
 +[[http://​www.ebi.ac.uk/​training/​ftp/​PhDtheses/​Daniel_Zerbino.pdf|Daniel Zerbino'​s PhD Thesis on Velvet]]
 +
 +Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time.
 +The colorspace version of velvet (_de) expects all data to be double-encoded. Mixed-space not directly supported.
 +
 +Velvet has support for long-read data.
 +
 +Velvet will accept sequence data from fastq input files, but does not use the quality information.
 +
 +==== Color-Space ====
 +
 +=== DE double-encoded ===
 +This is done by the pre-processor.
 +The primer base from the colorspace read is 
 +removed, followed by the first color, since
 +it was tied to the primer-base. ​
 +In the case of mate-paired reads,
 +the F3 read is reversed.
 +Then the colors are all converted to bases
 +for software that doesn'​t parse colorspace inputs.
 +Thus double-encoded means reads encoded in colorspace,
 +and then re-encoded as if bases in base-space.
 +
 +=== colorspace programs ===
 +
 +denovo_preprocessor
 +converts colorspace reads into double-encoded 24-base reads
 +that can be given to velvet_de.
 +
 +velveth_de
 +colorspace version of velveth hashes reads.
 +
 +velvetg_de
 +colorspace version of velvetg creates de Bruijn graph.
 +
 +denovo_postprocessor
 +converts velvet output double-encoded to colorspace contigs.
 +
 +denovo_adp - adapter program converts colorspace ​
 +to base-space while reducing read errors in colorspace as much as possible.
 +
 +[[http://​solidsoftwaretools.com/​gf/​project/​denovo/​|De-novo Tools for velvet from ABI for Solid]]
 +
 +==== Running ====
 +
 +Strategy: ​
 +  - Find the right value for k.  For short reads remember to keep k small for good kmer coverage.
 +  - Find the right values for exp_cov and cov-cutoff. This is very important.
 +    * velvet-estimate-exp_cov.pl out/​stats.txt makes a useful graph.
 +  - If you only have long reads, use them also as your short reads.
 +
 +For 454 long reads, this was our best result:
 +  velveth out 31 -short 454/?​.TCA.454Reads.fna -long 454/?​.TCA.454Reads.fna
 +  velvetg out -exp_cov 60 -cov_cutoff 13
 +  Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/​782604 reads
 +
 +
 +==== Failures ====
 +
 +=== VelvetOptimiser ===
 +The contributed (velvet/​contrib/​) utility VelvetOptimiser is intended to help find 
 +the critical parameters k, exp_cov, and cov_cutoff. ​ However although it found k,
 +it got stuck on a local maximum on coverage and failed to produce anything useful.
 +
 +=== pseudoFlow ===
 +Wondering if homopolymer errors in 454 data could cause trouble for the DBG,
 +I made a utility called pseudoFlow.c that takes all homopolymers longer than 
 +6 and shortens them to 6.  We know that in the range 1 to 6, 454 is accurate.
 +In any case, the pseudoFlow version of the data did not perform better,
 +in fact it was a little worse.
 +
 +==== Installing ====
 +
 +  ssh campusrocks.cse.ucsc.edu
 +  ​
 +  cd /​campusdata/​BME235/​programs
 +  wget http://​www.ebi.ac.uk/​~zerbino/​velvet/​velvet_0.7.62.tgz
 +  tar xfz velvet_0.7.62.tgz
 +  mv velvet_0.7.62 velvet
 +  mv velvet_0.7.62.tgz velvet/
 +  cd velvet
 +  make
 +  make color
 +  # color versions work with solid, have _de extension
 +  # install to bin dir
 +  cp velveth velvetg velveth_de velvetg_de /​campusdata/​BME235/​bin/​
 +
 +
 +==== Examples ====
 +
 +[[http://​kevin-gattaca.blogspot.com/​2009/​12/​de-novo-assembly-with-abi-solid-reads.html|example of using velvet with solid]]
 +
 +==== Website ====
 +[[http://​www.ebi.ac.uk/​~zerbino/​velvet/​]]
 +
 +==== Source with Binaries and Documentation ====
 +[[http://​www.ebi.ac.uk/​~zerbino/​velvet/​velvet_0.7.62.tgz]]
 +
 +===== References =====
 +<​refnotes>​notes-separator:​ none</​refnotes>​
 +~~REFNOTES cite~~
  
archive/bioinformatic_tools/velvet.1270669680.txt.gz · Last modified: 2010/04/07 19:48 by galt