Banana Slug Genomics

Galt's lecture on velvet

A lot of parameters with velvet to work with
Easy to install compared to other tools like allpaths
/campus/BME235/programs/velvet/ is the home directory
- To install, simply type 'make'
- Copy binaries
- Can change the max-kmer maximum size at compilation
  - 31mers are max size for a 64-bit word
- Can add solid support by adding in a “_de” tag for double-encoded.
- Velvet is one of the only de-novo assembly tools supporting colorspace reads
~velvet/data
- Contains a simulated genome with simulated reads which can be used for testing the data
- No documentation describes the files within this directory
~velvet/contrib – Contributors directory
- velvet-estimate-coverage.pl – Estimates expected coverage
  - Maybe reading the stats.txt file which velvet produces
- VelvetOptimiser-2.1.0
  - Written to try to find the best parameters for assembling the genome
  - Did not work, got stuck on local maximum for exp_cov and cov_cutoff
~assemblies/Pog/velvet-assembly1
- Run velvet without any arguments under the assumption that velvet with estimate some correct parameters
- README
- Ran without using any quality values. Velvet does not use quality values even if using fastq input files.
- Runs in 5-8 minutes
- First estimated coverage and cutoff values were terrible.
- Largest contig 713 bp long with an n50 of 27.
- Assembling a single lane did better with only 1 lane, rather than two gave much better results
~assemblies/test/velvet-assembly2
- Running velvet on simulated 100kb genome with 35bp and 100bp reads.
- Lowering the kmer value to 21 yielded a genome of 99975/100000 bases
- Read length and error rate combined with longer kmer lengths may cause problems.
  - A single error in a read may prevent overlapping of K-mers.
  - A kmer is too long then you may not get overlapping
  - A kmer is too short, then you get too much overlapping where it shouldn't occur
- Starting running on hgw-dev in order to get help from Daniel Zerbino
  - Used less options: -short
  - estimated coverage cutoff 13 which was right
  - saw some larger contigs: 21211max and 5554 n50
  - ran both short and long reads from the same 454 data.
    - Documentation states that short and long reads are the same except for sme tracking data which is potentially expensive
    - 82889max, 28393 n50
  - Combining both lanes of the 454 yielded 142283 max and 50864 n50.
~assemblies/Pog/velvet-assembly?
- Using flowspace data collapsed some nodes
- -long_multi_cutoff 4 (over default 2)
  - Have to have seen it multiple times before you
  - Showed improvements around 8 multi_cutoff
  - Lowering cov_cutoff
    - max 680241, 224364 n50
    - This is the best so far
- large kmer values don't show too much of an improvement
velvet-assembly2
- utilizes -shortPaired reads
- Can also utilize unpaired reads as -short
- may have some problems with reversing f3 primer
- insert length 2200 (estimated by mapping paired reads onto 454 contigs from newbler)
- adjusting other paramets did not improve assembly
- Combining short and long data did not seem to help

Evaluating assembled genomes

contig-lengths.rdb within assembly directories
- bottom of bin \t number of bin \t
take contigs and use various search tools, megablast, blastn, blat-strict-match, differences, differences.short
- megablast fast but numb (not sensitive)
- blastn sensitive
- blat very sensitive. produces psl output.
  - psl longs can get very long
  - labeled table at the top
  - Filter program blat-strict-match selects the best matches.
- Differences
  - shows differences between assemblies on the pieces that can map

You could leave a comment if you were logged in.

Banana Slug Genomics

User Tools

Site Tools

Page Tools