User Tools

Site Tools


archive:bioinformatic_tools:velvet

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:velvet [2010/04/07 19:51]
galt
archive:bioinformatic_tools:velvet [2015/07/28 06:27] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:velvet to archive:bioinformatic_tools:velvet
Line 1: Line 1:
 ===== VELVET ===== ===== VELVET =====
-====High Level Overview==== 
-Velvet was developed by Ewan Birney and Daniel R. Zerbino for de-novo assembly of short-reads using de Bruijn graphs. 
  
-Zerbino ​D, Birney ​E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. (200818:821829. +==== Overview==== 
-[[http://nar.oxfordjournals.org/cgi/ijlink?​linkType=ABST&​journalCode=genome&​resid=18/​5/​821|Free full text]]+Velvet was developed by Daniel R. Zerbino ​and Ewan Birney. 
 + 
 +**Velvet: algorithms for de novo short read assembly using de Bruijn graphs**  
 +[(cite:​velvet>​Daniel RZerbino and Ewan Birney.\\ 
 +Velvet: Algorithms for de novo short read assembly using de Bruijn graphs\\ 
 +Genome Res. May 2008 18: 821-829; Published in Advance March 18, 2008, \\ 
 +doi:[[http://dx.doi.org/10.1101/gr.074492.107|10.1101/​gr.074492.107]
 +)]
  
 Velvet may be downloaded free from [[http://​www.ebi.ac.uk/​~zerbino/​velvet/​|here]] (GPL license). Velvet may be downloaded free from [[http://​www.ebi.ac.uk/​~zerbino/​velvet/​|here]] (GPL license).
  
-There is a [[http://en.wikipedia.org/wiki/Velvet_(software)|wiki article]] about velvet.+On wikipedia: [[wp>​Velvet_(software)|Velvet]]. 
 + 
 +[[http://www.ebi.ac.uk/training/ftp/​PhDtheses/​Daniel_Zerbino.pdf|Daniel Zerbino'​s PhD Thesis on Velvet]] 
 + 
 +Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. 
 +The colorspace version of velvet ​(_deexpects all data to be double-encoded. Mixed-space not directly supported. 
 + 
 +Velvet has support for long-read data. 
 + 
 +Velvet will accept sequence data from fastq input files, but does not use the quality information. 
 + 
 +==== Color-Space ==== 
 + 
 +=== DE double-encoded === 
 +This is done by the pre-processor. 
 +The primer base from the colorspace read is  
 +removed, followed by the first color, since 
 +it was tied to the primer-base.  
 +In the case of mate-paired reads, 
 +the F3 read is reversed. 
 +Then the colors are all converted to bases 
 +for software that doesn'​t parse colorspace inputs. 
 +Thus double-encoded means reads encoded in colorspace,​ 
 +and then re-encoded as if bases in base-space. 
 + 
 +=== colorspace programs === 
 + 
 +denovo_preprocessor 
 +converts colorspace reads into double-encoded 24-base reads 
 +that can be given to velvet_de. 
 + 
 +velveth_de 
 +colorspace version of velveth hashes reads. 
 + 
 +velvetg_de 
 +colorspace version of velvetg creates de Bruijn graph. 
 + 
 +denovo_postprocessor 
 +converts velvet output double-encoded to colorspace contigs. 
 + 
 +denovo_adp - adapter program converts colorspace  
 +to base-space while reducing read errors in colorspace as much as possible. 
 + 
 +[[http://​solidsoftwaretools.com/​gf/​project/​denovo/​|De-novo Tools for velvet from ABI for Solid]] 
 + 
 +==== Running ==== 
 + 
 +Strategy:  
 +  - Find the right value for k.  For short reads remember to keep k small for good kmer coverage. 
 +  - Find the right values for exp_cov and cov-cutoff. This is very important. 
 +    * velvet-estimate-exp_cov.pl out/​stats.txt makes a useful graph. 
 +  - If you only have long reads, use them also as your short reads. 
 + 
 +For 454 long reads, this was our best result: 
 +  velveth out 31 -short 454/?​.TCA.454Reads.fna -long 454/?​.TCA.454Reads.fna 
 +  velvetg out -exp_cov 60 -cov_cutoff 13 
 +  Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/​782604 reads 
 + 
 + 
 +==== Failures ==== 
 + 
 +=== VelvetOptimiser === 
 +The contributed (velvet/​contrib/​) utility VelvetOptimiser is intended to help find  
 +the critical parameters k, exp_cov, and cov_cutoff. ​ However although it found k, 
 +it got stuck on a local maximum on coverage and failed to produce anything useful. 
 + 
 +=== pseudoFlow === 
 +Wondering if homopolymer errors in 454 data could cause trouble for the DBG, 
 +I made a utility called pseudoFlow.c that takes all homopolymers longer than  
 +6 and shortens them to 6.  We know that in the range 1 to 6, 454 is accurate. 
 +In any case, the pseudoFlow version of the data did not perform better, 
 +in fact it was a little worse. 
 + 
 +==== Installing ==== 
 + 
 +  ssh campusrocks.cse.ucsc.edu 
 +   
 +  cd /​campusdata/​BME235/​programs 
 +  wget http://​www.ebi.ac.uk/​~zerbino/​velvet/​velvet_0.7.62.tgz 
 +  tar xfz velvet_0.7.62.tgz 
 +  mv velvet_0.7.62 velvet 
 +  mv velvet_0.7.62.tgz velvet/ 
 +  cd velvet 
 +  make 
 +  make color 
 +  # color versions work with solid, have _de extension 
 +  # install to bin dir 
 +  cp velveth velvetg velveth_de velvetg_de /​campusdata/​BME235/​bin/​ 
 + 
 + 
 +==== Examples ==== 
 + 
 +[[http://​kevin-gattaca.blogspot.com/​2009/​12/​de-novo-assembly-with-abi-solid-reads.html|example of using velvet with solid]] 
 + 
 +==== Website ==== 
 +[[http://​www.ebi.ac.uk/​~zerbino/​velvet/​]]
  
 +==== Source with Binaries and Documentation ====
 +[[http://​www.ebi.ac.uk/​~zerbino/​velvet/​velvet_0.7.62.tgz]]
  
 +===== References =====
 +<​refnotes>​notes-separator:​ none</​refnotes>​
 +~~REFNOTES cite~~
  
archive/bioinformatic_tools/velvet.1270669919.txt.gz · Last modified: 2010/04/07 19:51 by galt