This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
archive:bioinformatic_tools:jellyfish [2011/05/16 20:37] karplus Added numbers from Illumina run 1 counts |
archive:bioinformatic_tools:jellyfish [2011/05/26 01:58] karplus [Banana Slug] Added post-SeqPrep estimates |
||
---|---|---|---|
Line 38: | Line 38: | ||
Total distinct: 2,298,220,805 (19-mers) 2,699,479,169 (22-mers) | Total distinct: 2,298,220,805 (19-mers) 2,699,479,169 (22-mers) | ||
These counts were done before running SeqPrep, so include adapter reads. | These counts were done before running SeqPrep, so include adapter reads. | ||
+ | |||
+ | After running SeqPrep, using all the illumina data produced | ||
+ | |||
+ | {{:bioinformatic_tools:fit-gamma-illumina-all-seqprep.png|}} | ||
+ | |||
+ | We have 2196163636 distinct 19-mers. If we use 2-or-less as the criterion for calling a k-mer a sequencing error, we get 1,222,498,009 distinct k-mers---close to our previous estimates. | ||
+ | |||
+ | The fit-gamma-illumina-all-seqprep.gnuplot script gives an estimated coverage of 10.247. If we divide the total number of k-mers (23731306715) by the approximate coverage, we get a genome length of 2.3159 Gbases. | ||
+ | |||
====== Gamma distribution is wrong ====== | ====== Gamma distribution is wrong ====== |