Galt's lecture on velvet * A lot of parameters with velvet to work with * Easy to install compared to other tools like allpaths * /campus/BME235/programs/velvet/ is the home directory * To install, simply type 'make' * Copy binaries * Can change the max-kmer maximum size at compilation * 31mers are max size for a 64-bit word * Can add solid support by adding in a "_de" tag for double-encoded. * Velvet is one of the only de-novo assembly tools supporting colorspace reads * ~velvet/data * Contains a simulated genome with simulated reads which can be used for testing the data * No documentation describes the files within this directory * ~velvet/contrib -- Contributors directory * velvet-estimate-coverage.pl -- Estimates expected coverage * Maybe reading the stats.txt file which velvet produces * VelvetOptimiser-2.1.0 * Written to try to find the best parameters for assembling the genome * Did not work, got stuck on local maximum for exp_cov and cov_cutoff * ~assemblies/Pog/velvet-assembly1 * Run velvet without any arguments under the assumption that velvet with estimate some correct parameters * README * Ran without using any quality values. Velvet does not use quality values even if using fastq input files. * Runs in 5-8 minutes * First estimated coverage and cutoff values were terrible. * Largest contig 713 bp long with an n50 of 27. * Assembling a single lane did better with only 1 lane, rather than two gave much better results * ~assemblies/test/velvet-assembly2 * Running velvet on simulated 100kb genome with 35bp and 100bp reads. * Lowering the kmer value to 21 yielded a genome of 99975/100000 bases * Read length and error rate combined with longer kmer lengths may cause problems. * A single error in a read may prevent overlapping of K-mers. * A kmer is too long then you may not get overlapping * A kmer is too short, then you get too much overlapping where it shouldn't occur * Starting running on hgw-dev in order to get help from Daniel Zerbino * Used less options: -short * estimated coverage cutoff 13 which was right * saw some larger contigs: 21211max and 5554 n50 * ran both short and long reads from the same 454 data. * Documentation states that short and long reads are the same except for sme tracking data which is potentially expensive * 82889max, 28393 n50 * Combining both lanes of the 454 yielded 142283 max and 50864 n50. *~assemblies/Pog/velvet-assembly? * Using flowspace data collapsed some nodes * -long_multi_cutoff 4 (over default 2) * Have to have seen it multiple times before you * Showed improvements around 8 multi_cutoff * Lowering cov_cutoff * max 680241, 224364 n50 * This is the best so far * large kmer values don't show too much of an improvement * velvet-assembly2 * utilizes -shortPaired reads * Can also utilize unpaired reads as -short * may have some problems with reversing f3 primer * insert length 2200 (estimated by mapping paired reads onto 454 contigs from newbler) * adjusting other paramets did not improve assembly * Combining short and long data did not seem to help Evaluating assembled genomes * contig-lengths.rdb within assembly directories * * bottom of bin \t number of bin \t * take contigs and use various search tools, megablast, blastn, blat-strict-match, differences, differences.short * megablast fast but numb (not sensitive) * blastn sensitive * blat very sensitive. produces psl output. * psl longs can get very long * labeled table at the top * Filter program blat-strict-match selects the best matches. * Differences * shows differences between assemblies on the pieces that can map *