User Tools

Site Tools


archive:bioinformatic_tools:quake

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:quake [2011/05/27 21:12]
eyliaw
archive:bioinformatic_tools:quake [2015/07/28 06:26] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:quake to archive:bioinformatic_tools:quake
Line 27: Line 27:
 [Kevin] I think that we want to select the k-mer size manually, rather than relying on quake. ​ Their default cutoff is very conservative,​ and we'll probably do better over-correcting than under-correcting.  ​ [Kevin] I think that we want to select the k-mer size manually, rather than relying on quake. ​ Their default cutoff is very conservative,​ and we'll probably do better over-correcting than under-correcting.  ​
 If we look at the hugely over-represented k-mers (like the adapter sequences), and compare the true sequences to one that are one base different, we see that the true ones are about 30 times as frequent. ​ Thus quake'​s idea of correcting only the rarely seen k-mers isn't quite right. ​ What we really want to correct are those k-mers that are close neighbors of much more frequent k-mers. ​ I've not figured out yet precisely what "much more frequent"​ should mean. If we look at the hugely over-represented k-mers (like the adapter sequences), and compare the true sequences to one that are one base different, we see that the true ones are about 30 times as frequent. ​ Thus quake'​s idea of correcting only the rarely seen k-mers isn't quite right. ​ What we really want to correct are those k-mers that are close neighbors of much more frequent k-mers. ​ I've not figured out yet precisely what "much more frequent"​ should mean.
 +
 +===== Potential Problems ======
 +  * Input files need to have an extension, or Quake will throw a substr error when trying to merge hidden files into a result.
 +  * With paired-end input, Quake will output two files for each paired-end read.  One will be the cor.fastq file, which contains corrected, paired reads. ​ The other will be the cor_single.fastq file, which contains reads where only one pair could be corrected. ​ You can treat the cor_single.fastq file as a single read file.
  
 ===== Methods ===== ===== Methods =====
Line 34: Line 38:
  
 They also recommend a kmer probability of 0.01 in a random sequence that is as long as the genome. That is, 2*G/4^k ~ 0.01, where G is the size of the sequenced genome and k is the size of the kmer. Simplified, k ~ log4(200*G),​ which is about 19 for our prediction of the banana slug genome size, {{https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​jellyfish|2.042e+09}}. They also recommend a kmer probability of 0.01 in a random sequence that is as long as the genome. That is, 2*G/4^k ~ 0.01, where G is the size of the sequenced genome and k is the size of the kmer. Simplified, k ~ log4(200*G),​ which is about 19 for our prediction of the banana slug genome size, {{https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​jellyfish|2.042e+09}}.
- 
-===== Potential Problems ====== 
-  * Input files need to have an extension, or Quake will throw a substr error when trying to merge hidden files into a result. 
-  * With paired-end input, Quake will output two files for each paired-end read.  One will be the cor.fastq file, which contains corrected, paired reads. ​ The other will be the cor_single.fastq file, which contains reads where only one pair could be corrected. ​ You can treat the cor_single.fastq file as a single read file. 
archive/bioinformatic_tools/quake.1306530778.txt.gz · Last modified: 2011/05/27 21:12 by eyliaw