User Tools

Site Tools


lecture_notes:05-09-2011

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

lecture_notes:05-09-2011 [2011/05/13 23:43]
eyliaw created
lecture_notes:05-09-2011 [2011/05/14 01:26] (current)
eyliaw
Line 1: Line 1:
 ====== Quake paper ====== ====== Quake paper ======
 Presented by Edward. ​ We talked about the reasoning behind kmer counting for corrections. ​ Since we know the coverage to be some value, we expect to see that many kmers across all the reads. ​ Sequencing miscalls would only occur at a 1% error rate, so we would see much fewer kmers in those reads. Presented by Edward. ​ We talked about the reasoning behind kmer counting for corrections. ​ Since we know the coverage to be some value, we expect to see that many kmers across all the reads. ​ Sequencing miscalls would only occur at a 1% error rate, so we would see much fewer kmers in those reads.
 +
 +We also discussed the qmer counting approach, where Quake would increment by the probability of the base being correct, as interpreted from the Phred score = 10*log_10(1-P),​ where P is the probability of the base being correct.
 +
 +Lastly, we talked about how Quake processes corrections. ​ It uses a probability model based on the GC% and the Illumina base miscall substitution rate to calculate the likelihood of a miscall. ​ It then makes corrections to repair the error with the highest chance of occurrence, until it finds one that matches a trusted kmer.  It uses a bit array to store the trusted kmers, so the actual counts of each kmer is not stored; one weakness of this approach is that it does not take the likelihood of that kmer occurring into account when making the corrections. ​ Once it matches a kmer, it continues searching for another, up to a threshold. ​ If more than one is found, the correction is ambiguous and it discards the result instead of correcting it.
lecture_notes/05-09-2011.1305330208.txt.gz ยท Last modified: 2011/05/13 23:43 by eyliaw