User Tools

Site Tools


lecture_notes:05-18-2011

This is an old revision of the document!


A PCRE internal error occured. This might be caused by a faulty plugin

====== K-mer Counting Theory ====== Choosing the K-mer size has trade-offs. A short K-mer will appear often and a long K-mer will have a higher chance of having multiple errors in it. <code> k = length of K-mer b = base error rate (0.01 1 base in 100 wrong) j = # of changes or errors G = genome size c = coverage Big assumption: P(A) = P(C) = P(G) = P(T) = 1/4 P( K-mer is OK ) = (1 - b^k) P( K-mer has exactly j errors ) = ( k choose j ) b^j (1-b)^(k-j) [Binomial distribution] # of neighboring K-mers 1 change: 3 * k 2 changes: ( k choose 2 ) 3^2 j changes: ( k choose j ) 3^j P( a random K-mer is in genome ) = 1 - (1-(1/(4^k))^G P( k-mer has a "trusted" j-neighbor ) = 1 - ( (1-(1/4^k))^G ) ^ ( (k choose j) 3^j ) </code>

You could leave a comment if you were logged in.
lecture_notes/05-18-2011.1305756259.txt.gz · Last modified: 2011/05/18 22:04 by svohr