User Tools

Site Tools


archive:bioinformatic_tools:quake

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:quake [2011/05/09 18:03]
eyliaw [Running Quake]
archive:bioinformatic_tools:quake [2015/07/28 06:26] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:quake to archive:bioinformatic_tools:quake
Line 19: Line 19:
 In the file list, you should tab-separate paired end reads. ​ Also, be sure that all .'s in the sequence are written as N'​s. ​ If the quality scores are written as Phred + 64, you can use -q 64 to handle it. In the file list, you should tab-separate paired end reads. ​ Also, be sure that all .'s in the sequence are written as N'​s. ​ If the quality scores are written as Phred + 64, you can use -q 64 to handle it.
  
-[Kevin] I think that we want to select the k-mer size manually, rather than relying on quake. ​ Their default cutoff is very conservative,​ and we'll probably do better over-correcting than under-correcting. ​  +K-mer counts can also be pre-filtered to save space.
-If we look at the hugely over-represented k-mers (like the adapter sequences), and compare the true sequences to one that are one base different, we see that the true ones are about 30 times as frequent. ​ Thus quake'​s idea of correcting only the rarely seen k-mers isn't quite right. ​ What we really want to correct are those k-mers that are close neighbors of much more frequent k-mers. ​ I've not figured out yet precisely what "much more frequent"​ should mean. +
- +
- +
-K-mer counts can be pre-filtered to save space+
  
 Quake dev:  ​ Quake dev:  ​
 Once you've decided on a cutoff, Quake ignores all of the k-mers below that cutoff. So sure, you can filter the file to save some disk space. But having all of the k-mer counts is best for choosing the cutoff. My cov_model.py script to automatically choose the cutoff requires them. Once you've decided on a cutoff, Quake ignores all of the k-mers below that cutoff. So sure, you can filter the file to save some disk space. But having all of the k-mer counts is best for choosing the cutoff. My cov_model.py script to automatically choose the cutoff requires them.
 +
 +
 +[Kevin] I think that we want to select the k-mer size manually, rather than relying on quake. ​ Their default cutoff is very conservative,​ and we'll probably do better over-correcting than under-correcting.  ​
 +If we look at the hugely over-represented k-mers (like the adapter sequences), and compare the true sequences to one that are one base different, we see that the true ones are about 30 times as frequent. ​ Thus quake'​s idea of correcting only the rarely seen k-mers isn't quite right. ​ What we really want to correct are those k-mers that are close neighbors of much more frequent k-mers. ​ I've not figured out yet precisely what "much more frequent"​ should mean.
 +
 +===== Potential Problems ======
 +  * Input files need to have an extension, or Quake will throw a substr error when trying to merge hidden files into a result.
 +  * With paired-end input, Quake will output two files for each paired-end read.  One will be the cor.fastq file, which contains corrected, paired reads. ​ The other will be the cor_single.fastq file, which contains reads where only one pair could be corrected. ​ You can treat the cor_single.fastq file as a single read file.
  
 ===== Methods ===== ===== Methods =====
archive/bioinformatic_tools/quake.1304964207.txt.gz · Last modified: 2011/05/09 18:03 by eyliaw