This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
archive:bioinformatic_tools:velvet [2010/04/23 08:25] galt |
archive:bioinformatic_tools:velvet [2015/07/28 06:27] (current) ceisenhart ↷ Page moved from bioinformatic_tools:velvet to archive:bioinformatic_tools:velvet |
||
---|---|---|---|
Line 18: | Line 18: | ||
Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. | Velvet has support for COLORSPACE, possibly the only de-novo short-read DBG assembler that does at this time. | ||
+ | The colorspace version of velvet (_de) expects all data to be double-encoded. Mixed-space not directly supported. | ||
Velvet has support for long-read data. | Velvet has support for long-read data. | ||
+ | |||
+ | Velvet will accept sequence data from fastq input files, but does not use the quality information. | ||
==== Color-Space ==== | ==== Color-Space ==== | ||
Line 60: | Line 63: | ||
- Find the right value for k. For short reads remember to keep k small for good kmer coverage. | - Find the right value for k. For short reads remember to keep k small for good kmer coverage. | ||
- Find the right values for exp_cov and cov-cutoff. This is very important. | - Find the right values for exp_cov and cov-cutoff. This is very important. | ||
+ | * velvet-estimate-exp_cov.pl out/stats.txt makes a useful graph. | ||
- If you only have long reads, use them also as your short reads. | - If you only have long reads, use them also as your short reads. | ||
- | For 454 long reads, | + | For 454 long reads, this was our best result: |
- | Final graph has 1484 nodes and n50 of 48088, max 135010, total 2464896, using 778281/782604 reads | + | velveth out 31 -short 454/?.TCA.454Reads.fna -long 454/?.TCA.454Reads.fna |
+ | velvetg out -exp_cov 60 -cov_cutoff 13 | ||
+ | Final graph has 1755 nodes and n50 of 41723, max 142286, total 2468925, using 778257/782604 reads | ||
+ | |||
+ | |||
+ | ==== Failures ==== | ||
+ | |||
+ | === VelvetOptimiser === | ||
+ | The contributed (velvet/contrib/) utility VelvetOptimiser is intended to help find | ||
+ | the critical parameters k, exp_cov, and cov_cutoff. However although it found k, | ||
+ | it got stuck on a local maximum on coverage and failed to produce anything useful. | ||
+ | |||
+ | === pseudoFlow === | ||
+ | Wondering if homopolymer errors in 454 data could cause trouble for the DBG, | ||
+ | I made a utility called pseudoFlow.c that takes all homopolymers longer than | ||
+ | 6 and shortens them to 6. We know that in the range 1 to 6, 454 is accurate. | ||
+ | In any case, the pseudoFlow version of the data did not perform better, | ||
+ | in fact it was a little worse. | ||
==== Installing ==== | ==== Installing ==== |