User Tools

Site Tools


archive:bioinformatic_tools:velvet

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
archive:bioinformatic_tools:velvet [2010/04/23 07:09]
galt Added Daniel Z.'s phd thesis on velvet
archive:bioinformatic_tools:velvet [2010/04/28 20:58]
galt
Line 18: Line 18:
  
 Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time.
 +The colorspace version of velvet (_de) expects all data to be double-encoded. Mixed-space not directly supported.
  
 Velvet has support for long-read data. Velvet has support for long-read data.
 +
 +Velvet will accept sequence data from fastq input files, but does not use the quality information.
  
 ==== Color-Space ==== ==== Color-Space ====
Line 54: Line 57:
  
 [[http://​solidsoftwaretools.com/​gf/​project/​denovo/​|De-novo Tools for velvet from ABI for Solid]] [[http://​solidsoftwaretools.com/​gf/​project/​denovo/​|De-novo Tools for velvet from ABI for Solid]]
 +
 +==== Running ====
 +
 +Strategy: ​
 +  - Find the right value for k.  For short reads remember to keep k small for good kmer coverage.
 +  - Find the right values for exp_cov and cov-cutoff. This is very important.
 +    * velvet-estimate-exp_cov.pl out/​stats.txt makes a useful graph.
 +  - If you only have long reads, use them also as your short reads.
 +
 +For 454 long reads, this was our best result:
 +  velveth out 31 -short 454/?​.TCA.454Reads.fna -long 454/?​.TCA.454Reads.fna
 +  velvetg out -exp_cov 60 -cov_cutoff 13
 +  Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/​782604 reads
 +
 +
 +==== Failures ====
 +
 +=== VelvetOptimiser ===
 +The contributed (velvet/​contrib/​) utility VelvetOptimiser is intended to help find 
 +the critical parameters k, exp_cov, and cov_cutoff. ​ However although it found k,
 +it got stuck on a local maximum on coverage and failed to produce anything useful.
 +
 +=== pseudoFlow ===
 +Wondering if homopolymer errors in 454 data could cause trouble for the DBG,
 +I made a utility called pseudoFlow.c that takes all homopolymers longer than 
 +6 and shortens them to 6.  We know that in the range 1 to 6, 454 is accurate.
 +In any case, the pseudoFlow version of the data did not perform better,
 +in fact it was a little worse.
  
 ==== Installing ==== ==== Installing ====
archive/bioinformatic_tools/velvet.txt · Last modified: 2015/07/28 06:27 by ceisenhart