User Tools

Site Tools


archive:bioinformatic_tools:jellyfish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
archive:bioinformatic_tools:jellyfish [2011/04/24 20:44]
eyliaw
archive:bioinformatic_tools:jellyfish [2011/05/27 20:51]
eyliaw
Line 1: Line 1:
 ====== Jellyfish ====== ====== Jellyfish ======
 +The current version installed on campusrocks is 1.1 (official release).
 +
 Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://​www.cbcb.umd.edu/​software/​jellyfish/​]][(cite:​jellyfish>​Marçais,​ Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:​10.1093/​bioinformatics/​btr011)] Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://​www.cbcb.umd.edu/​software/​jellyfish/​]][(cite:​jellyfish>​Marçais,​ Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:​10.1093/​bioinformatics/​btr011)]
  
Line 30: Line 32:
  
 {{:​bioinformatic_tools:​slug-fit-gamma-illumina1.png|}} {{:​bioinformatic_tools:​slug-fit-gamma-illumina1.png|}}
 +
 +For run1, the first few distinct kmer with the specified multiplicities are
 +  - 970,576,481 (19-mers) ​ 1,​242,​303,​036 (22-mers)
 +  - 95,088,167 (19-mers) ​ 100,246,200 (22-mers)
 +  - 55,353,345 (19-mers) ​ 67,039,962 (22-mers)
 +  - 60,129,122 (19-mers) ​ 77,381,432 (22-mers)
 +Total distinct: 2,​298,​220,​805 (19-mers) 2,​699,​479,​169 (22-mers)
 +These counts were done before running SeqPrep, so include adapter reads.
 +
 +After running SeqPrep, using all the illumina data produced ​
 +
 +{{:​bioinformatic_tools:​fit-gamma-illumina-all-seqprep.png|}}
 +
 +We have 2196163636 distinct 19-mers. If we use 2-or-less as the criterion for calling a k-mer a sequencing error, we get 1,​222,​498,​009 distinct k-mers---close to our previous estimates.
 +
 +The fit-gamma-illumina-all-seqprep.gnuplot script gives an estimated coverage of 10.247. ​ If we divide the total number of k-mers (23731306715) by the approximate coverage, we get a genome length of 2.3159 Gbases.
 +
  
 ====== Gamma distribution is wrong ====== ====== Gamma distribution is wrong ======
archive/bioinformatic_tools/jellyfish.txt · Last modified: 2015/07/28 06:23 by ceisenhart