User Tools

Site Tools


lecture_notes:04-26-2010

Galt's lecture on velvet

  • A lot of parameters with velvet to work with
  • Easy to install compared to other tools like allpaths
  • /campus/BME235/programs/velvet/ is the home directory
    • To install, simply type 'make'
    • Copy binaries
    • Can change the max-kmer maximum size at compilation
      • 31mers are max size for a 64-bit word
    • Can add solid support by adding in a “_de” tag for double-encoded.
    • Velvet is one of the only de-novo assembly tools supporting colorspace reads
  • ~velvet/data
    • Contains a simulated genome with simulated reads which can be used for testing the data
    • No documentation describes the files within this directory
  • ~velvet/contrib – Contributors directory
    • velvet-estimate-coverage.pl – Estimates expected coverage
      • Maybe reading the stats.txt file which velvet produces
    • VelvetOptimiser-2.1.0
      • Written to try to find the best parameters for assembling the genome
      • Did not work, got stuck on local maximum for exp_cov and cov_cutoff
  • ~assemblies/Pog/velvet-assembly1
    • Run velvet without any arguments under the assumption that velvet with estimate some correct parameters
    • README
    • Ran without using any quality values. Velvet does not use quality values even if using fastq input files.
    • Runs in 5-8 minutes
    • First estimated coverage and cutoff values were terrible.
    • Largest contig 713 bp long with an n50 of 27.
    • Assembling a single lane did better with only 1 lane, rather than two gave much better results
  • ~assemblies/test/velvet-assembly2
    • Running velvet on simulated 100kb genome with 35bp and 100bp reads.
    • Lowering the kmer value to 21 yielded a genome of 99975/100000 bases
    • Read length and error rate combined with longer kmer lengths may cause problems.
      • A single error in a read may prevent overlapping of K-mers.
      • A kmer is too long then you may not get overlapping
      • A kmer is too short, then you get too much overlapping where it shouldn't occur
    • Starting running on hgw-dev in order to get help from Daniel Zerbino
      • Used less options: -short
      • estimated coverage cutoff 13 which was right
      • saw some larger contigs: 21211max and 5554 n50
      • ran both short and long reads from the same 454 data.
        • Documentation states that short and long reads are the same except for sme tracking data which is potentially expensive
        • 82889max, 28393 n50
      • Combining both lanes of the 454 yielded 142283 max and 50864 n50.
  • ~assemblies/Pog/velvet-assembly?
    • Using flowspace data collapsed some nodes
    • -long_multi_cutoff 4 (over default 2)
      • Have to have seen it multiple times before you
      • Showed improvements around 8 multi_cutoff
      • Lowering cov_cutoff
        • max 680241, 224364 n50
        • This is the best so far
    • large kmer values don't show too much of an improvement
  • velvet-assembly2
    • utilizes -shortPaired reads
    • Can also utilize unpaired reads as -short
    • may have some problems with reversing f3 primer
    • insert length 2200 (estimated by mapping paired reads onto 454 contigs from newbler)
    • adjusting other paramets did not improve assembly
    • Combining short and long data did not seem to help

Evaluating assembled genomes

  • contig-lengths.rdb within assembly directories
    • bottom of bin \t number of bin \t
  • take contigs and use various search tools, megablast, blastn, blat-strict-match, differences, differences.short
    • megablast fast but numb (not sensitive)
    • blastn sensitive
    • blat very sensitive. produces psl output.
      • psl longs can get very long
      • labeled table at the top
      • Filter program blat-strict-match selects the best matches.
    • Differences
      • shows differences between assemblies on the pieces that can map
You could leave a comment if you were logged in.
lecture_notes/04-26-2010.txt · Last modified: 2010/04/28 20:56 by galt