User Tools

Site Tools


archive:bioinformatic_tools:jellyfish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:jellyfish [2011/05/16 20:37]
karplus Added numbers from Illumina run 1 counts
archive:bioinformatic_tools:jellyfish [2015/07/28 06:23] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:jellyfish to archive:bioinformatic_tools:jellyfish
Line 1: Line 1:
 ====== Jellyfish ====== ====== Jellyfish ======
 +The current version installed on campusrocks is 1.1 (official release).
 +
 Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://​www.cbcb.umd.edu/​software/​jellyfish/​]][(cite:​jellyfish>​Marçais,​ Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:​10.1093/​bioinformatics/​btr011)] Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://​www.cbcb.umd.edu/​software/​jellyfish/​]][(cite:​jellyfish>​Marçais,​ Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:​10.1093/​bioinformatics/​btr011)]
  
Line 38: Line 40:
 Total distinct: 2,​298,​220,​805 (19-mers) 2,​699,​479,​169 (22-mers) Total distinct: 2,​298,​220,​805 (19-mers) 2,​699,​479,​169 (22-mers)
 These counts were done before running SeqPrep, so include adapter reads. These counts were done before running SeqPrep, so include adapter reads.
 +
 +After running SeqPrep, using all the illumina data produced ​
 +
 +{{:​bioinformatic_tools:​fit-gamma-illumina-all-seqprep.png|}}
 +
 +We have 2196163636 distinct 19-mers. If we use 2-or-less as the criterion for calling a k-mer a sequencing error, we get 1,​222,​498,​009 distinct k-mers---close to our previous estimates.
 +
 +The fit-gamma-illumina-all-seqprep.gnuplot script gives an estimated coverage of 10.247. ​ If we divide the total number of k-mers (23731306715) by the approximate coverage, we get a genome length of 2.3159 Gbases.
 +
  
 ====== Gamma distribution is wrong ====== ====== Gamma distribution is wrong ======
archive/bioinformatic_tools/jellyfish.1305578233.txt.gz · Last modified: 2011/05/16 20:37 by karplus