User Tools

Site Tools


archive:bioinformatic_tools:quake

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:quake [2011/05/07 11:53]
eyliaw
archive:bioinformatic_tools:quake [2015/07/28 06:26] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:quake to archive:bioinformatic_tools:quake
Line 15: Line 15:
 Finally, correct the reads: Finally, correct the reads:
  
-   ​correct -f [fastq ​list file] -k [k-mer size] -c [cutoff] -m [counts file] -p [number of cores]+   ​correct -f [fastq file list] -k [k-mer size] -c [cutoff] -m [counts file] -p [number of cores] ​-z (gzips the output)
  
-[Kevin] I think that we want to select ​the k-mer size manuallyrather than relying on quake. ​ Their default cutoff is very conservative,​ and we'll probably do better over-correcting than under-correcting.  ​ +In the file listyou should tab-separate paired end reads.  ​Alsobe sure that all .'s in the sequence ​are written ​as N'​s.  ​If the quality scores ​are written as Phred + 64, you can use -q 64 to handle it.
-If we look at the hugely over-represented k-mers (like the adapter sequences)and compare the true sequences to one that are one base different, we see that the true ones are about 30 times as frequent. ​ Thus quake'​s ​idea of correcting only the rarely seen k-mers isn't quite right.  ​What we really want to correct ​are those k-mers that are close neighbors of much more frequent k-mers. ​ I've not figured out yet precisely what "much more frequent"​ should mean.+
  
 +K-mer counts can also be pre-filtered to save space.
  
----- +Quake dev:   
-Can the k-mer counts be pre-filtered ​to save space?+Once you've decided on a cutoff, Quake ignores all of the k-mers below that cutoff. So sure, you can filter the file to save some disk space. But having all of the k-mer counts is best for choosing the cutoff. My cov_model.py script to automatically choose the cutoff requires them. 
 + 
 + 
 +[Kevin] I think that we want to select the k-mer size manually, rather than relying on quake. ​ Their default cutoff is very conservative,​ and we'll probably do better over-correcting than under-correcting.  ​ 
 +If we look at the hugely over-represented ​k-mers (like the adapter sequences), and compare the true sequences to one that are one base different, we see that the true ones are about 30 times as frequent. ​ Thus quake'​s idea of correcting only the rarely seen k-mers isn't quite right. ​ What we really want to correct are those k-mers that are close neighbors of much more frequent k-mers. ​ I've not figured out yet precisely what "much more frequent"​ should mean.
  
-Well sort of. Once you've decided on cutoff, Quake ignores all of the k-mers below that cutoffSo sure, you can filter ​the file to save some disk spaceBut having all of the k-mer counts is best for choosing the cutoffMy cov_model.py script to automatically choose ​the cutoff requires them. +===== Potential Problems ====== 
-----+  * Input files need to have an extension, or Quake will throw substr error when trying to merge hidden files into a result. 
 +  * With paired-end input, Quake will output two files for each paired-end read One will be the cor.fastq ​file, which contains corrected, paired reads The other will be the cor_single.fastq file, which contains reads where only one pair could be corrected You can treat the cor_single.fastq file as a single read file.
  
 ===== Methods ===== ===== Methods =====
archive/bioinformatic_tools/quake.1304769198.txt.gz · Last modified: 2011/05/07 11:53 by eyliaw