Differences

This shows you the differences between two versions of the page.

--- archive:bioinformatic_tools:jellyfish [2011/04/24 19:22]
eyliaw
+++ archive:bioinformatic_tools:jellyfish [2015/07/28 06:23] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:jellyfish to archive:bioinformatic_tools:jellyfish
@@ Line 1: / Line 1: @@
+====== Jellyfish ======
+The current version installed on campusrocks is 1.1 (official release).
 Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://www.cbcb.umd.edu/software/jellyfish/]][(cite:jellyfish>Marçais, Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:10.1093/bioinformatics/btr011)]
-The Jellyfish "stats" option allow for a bounded dump of the kmer table, using -L for the lower bound and -U for the upper bound.  Using this, we can examine high frequency kmers for abnormalities.
+The Jellyfish "stats" option allows for a bounded dump of the kmer table, using -L for the lower bound and -U for the upper bound.  Using this, we can examine high frequency kmers for abnormalities.
 The documentation is at
@@ Line 29: / Line 32: @@
 {{:bioinformatic_tools:slug-fit-gamma-illumina1.png|}}
+For run1, the first few distinct kmer with the specified multiplicities are
+  - 970,576,481 (19-mers)  1,242,303,036 (22-mers)
+  - 95,088,167 (19-mers)  100,246,200 (22-mers)
+  - 55,353,345 (19-mers)  67,039,962 (22-mers)
+  - 60,129,122 (19-mers)  77,381,432 (22-mers)
+Total distinct: 2,298,220,805 (19-mers) 2,699,479,169 (22-mers)
+These counts were done before running SeqPrep, so include adapter reads.
+After running SeqPrep, using all the illumina data produced
+{{:bioinformatic_tools:fit-gamma-illumina-all-seqprep.png|}}
+We have 2196163636 distinct 19-mers. If we use 2-or-less as the criterion for calling a k-mer a sequencing error, we get 1,222,498,009 distinct k-mers---close to our previous estimates.
+The fit-gamma-illumina-all-seqprep.gnuplot script gives an estimated coverage of 10.247.  If we divide the total number of k-mers (23731306715) by the approximate coverage, we get a genome length of 2.3159 Gbases.
 ====== Gamma distribution is wrong ======

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools