User Tools

Site Tools


archive:bioinformatic_tools:jellyfish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:jellyfish [2011/04/24 19:22]
eyliaw
archive:bioinformatic_tools:jellyfish [2015/07/28 06:23] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:jellyfish to archive:bioinformatic_tools:jellyfish
Line 1: Line 1:
 +====== Jellyfish ======
 +The current version installed on campusrocks is 1.1 (official release).
 +
 Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://​www.cbcb.umd.edu/​software/​jellyfish/​]][(cite:​jellyfish>​Marçais,​ Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:​10.1093/​bioinformatics/​btr011)] Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://​www.cbcb.umd.edu/​software/​jellyfish/​]][(cite:​jellyfish>​Marçais,​ Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:​10.1093/​bioinformatics/​btr011)]
  
Line 29: Line 32:
  
 {{:​bioinformatic_tools:​slug-fit-gamma-illumina1.png|}} {{:​bioinformatic_tools:​slug-fit-gamma-illumina1.png|}}
 +
 +For run1, the first few distinct kmer with the specified multiplicities are
 +  - 970,576,481 (19-mers) ​ 1,​242,​303,​036 (22-mers)
 +  - 95,088,167 (19-mers) ​ 100,246,200 (22-mers)
 +  - 55,353,345 (19-mers) ​ 67,039,962 (22-mers)
 +  - 60,129,122 (19-mers) ​ 77,381,432 (22-mers)
 +Total distinct: 2,​298,​220,​805 (19-mers) 2,​699,​479,​169 (22-mers)
 +These counts were done before running SeqPrep, so include adapter reads.
 +
 +After running SeqPrep, using all the illumina data produced ​
 +
 +{{:​bioinformatic_tools:​fit-gamma-illumina-all-seqprep.png|}}
 +
 +We have 2196163636 distinct 19-mers. If we use 2-or-less as the criterion for calling a k-mer a sequencing error, we get 1,​222,​498,​009 distinct k-mers---close to our previous estimates.
 +
 +The fit-gamma-illumina-all-seqprep.gnuplot script gives an estimated coverage of 10.247. ​ If we divide the total number of k-mers (23731306715) by the approximate coverage, we get a genome length of 2.3159 Gbases.
 +
  
 ====== Gamma distribution is wrong ====== ====== Gamma distribution is wrong ======
archive/bioinformatic_tools/jellyfish.1303672946.txt.gz · Last modified: 2011/04/24 19:22 by eyliaw