Galt's lecture on velvet
A lot of parameters with velvet to work with
Easy to install compared to other tools like allpaths
/campus/BME235/programs/velvet/ is the home directory
To install, simply type 'make'
Copy binaries
Can change the max-kmer maximum size at compilation
Can add solid support by adding in a “_de” tag for double-encoded.
Velvet is one of the only de-novo assembly tools supporting colorspace reads
~velvet/data
~velvet/contrib – Contributors directory
~assemblies/Pog/velvet-assembly1
Run velvet without any arguments under the assumption that velvet with estimate some correct parameters
README
Ran without using any quality values. Velvet does not use quality values even if using fastq input files.
Runs in 5-8 minutes
First estimated coverage and cutoff values were terrible.
Largest contig 713 bp long with an n50 of 27.
Assembling a single lane did better with only 1 lane, rather than two gave much better results
~assemblies/test/velvet-assembly2
Running velvet on simulated 100kb genome with 35bp and 100bp reads.
Lowering the kmer value to 21 yielded a genome of 99975/100000 bases
Read length and error rate combined with longer kmer lengths may cause problems.
A single error in a read may prevent overlapping of K-mers.
A kmer is too long then you may not get overlapping
A kmer is too short, then you get too much overlapping where it shouldn't occur
Starting running on hgw-dev in order to get help from Daniel Zerbino
Used less options: -short
estimated coverage cutoff 13 which was right
saw some larger contigs: 21211max and 5554 n50
ran both short and long reads from the same 454 data.
Combining both lanes of the 454 yielded 142283 max and 50864 n50.
~assemblies/Pog/velvet-assembly?
Using flowspace data collapsed some nodes
-long_multi_cutoff 4 (over default 2)
large kmer values don't show too much of an improvement
velvet-assembly2
utilizes -shortPaired reads
Can also utilize unpaired reads as -short
may have some problems with reversing f3 primer
insert length 2200 (estimated by mapping paired reads onto 454 contigs from newbler)
adjusting other paramets did not improve assembly
Combining short and long data did not seem to help
Evaluating assembled genomes
contig-lengths.rdb within assembly directories
take contigs and use various search tools, megablast, blastn, blat-strict-match, differences, differences.short