Banana Slug Genomics

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

Jellyfish is a tool for fast, memory efficient counting of K-mers in DNA [[http://www.cbcb.umd.edu/software/jellyfish/]][(cite:jellyfish>Marçais, Guillaume and Kingsford, Carl. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 first published online January 7, 2011 doi:10.1093/bioinformatics/btr011)]

Discussion

Louis Letourneau, 2011/05/10 19:39

I found an answer, it works nicely with gnuplot.

Banana slug, 2011/05/19 09:35

Hey Louis,

Can you please share the GNU-plot solution here with the rest as well :-) ?

Thank you.

Louis Letourneau, 2011/05/19 15:22

I'm sorry I can't get it to format the comment like I want.

Basically I ran jellyfish count with kmers from 11 to 31. I then ran jellyish histo for each of them.

Finally I wrote a script to generate a png from gnuplot.

gnuplot < scriptFile.gnu > image.png

scriptFile.gnu :

set terminal png nocrop size 1280,1025 set format y '10^(%.0f)'

plot \

“kmer_histo_11.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_11” smooth bezier,\

“kmer_histo_15.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_15” smooth bezier,\

“kmer_histo_17.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_17” smooth bezier,\

“kmer_histo_19.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_19” smooth bezier,\

“kmer_histo_21.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_21” smooth bezier,\

“kmer_histo_25.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_25” smooth bezier,\

“kmer_histo_27.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_27” smooth bezier,\

“kmer_histo_31.txt” using 1:($2 < 1 ? 1 : log10($2)) title “bezier_31” smooth bezier;

Louis Letourneau, 2011/05/19 15:23

I also have the same “effect” of this page, any kmer lower than 15 is pretty much useless. Well not useless, the it doesn't bring our the information of what is erroneous in the the reads.

Louis Letourneau, 2011/05/04 03:18

May I ask what exactly you used to plot the graphs? Did you plot the output of jellyfish histo directly? Thanks

Edward Liaw, 2011/04/09 07:44

I added in two python scripts to BME235/bin: qseq2fasta and qseq2fastq, to parse the Illumina files. There is a faster implementation of the quality score conversion using a numpy array instead of ord→chr individually but I was having trouble installing numpy.

Be sure to use the -f option when running with Jellyfish. -f will filter out reads that did not pass the Illumina's filter.

John St. John, 2011/04/08 20:22

Jellyfish has a new version that allows for fastq file input. I posted it in my public dropbox folder as the developer hasn't posted it to his website yet.

http://dl.dropbox.com/u/3907635/jellyfish-1.1.tar.gz

To use it with a bunch of fastq files that are gziped for example:

zcat folders_*/s_*_*.fastq.gz | jellyfish count -m 19 -C -o kmer_hash -s 1000000000 -t 32 /dev/fd/0

the last part /dev/fd/0 tells jellyfish to read from stdin. This is nice because you can run a program that converts qseq to fastq and pipes to stdout and then pipe that to jellyfish without having to worry about making tmp files if you don't want to make fastq files in the mean time.

You could leave a comment if you were logged in.

Banana Slug Genomics

User Tools

Site Tools

Discussion

Page Tools