User Tools

Site Tools


lecture_notes:04-17-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

lecture_notes:04-17-2015 [2015/04/17 23:25]
sihussai created (incomplete)
lecture_notes:04-17-2015 [2015/04/20 00:29]
sihussai
Line 2: Line 2:
  
 =====Administrative===== =====Administrative=====
-  * Lucigen mate pari data is up +  * Lucigen mate pair data is up 
     * Josh from Ed's lab worked on it, we can ask him questions     * Josh from Ed's lab worked on it, we can ask him questions
   * Presentations starting next week   * Presentations starting next week
Line 44: Line 44:
     * Introduces ambiguity because you lose the directionality of the arrow because you don't know if you are looking at the forward strand or the reverse strand. ​     * Introduces ambiguity because you lose the directionality of the arrow because you don't know if you are looking at the forward strand or the reverse strand. ​
     * **Just make k odd!** Then you can never have a perfect palindrome.     * **Just make k odd!** Then you can never have a perfect palindrome.
-  * Tip from Ed: Write reverse complement strand as mirror image, the actual way that it is in the DNA. That way 5' to 3' is explicit, you can physically rotate the paper and it looks correct still. ​  +  * Tip from Ed: Write reverse complement strand as mirror image, the actual way that it is in the DNA. That way 5' to 3' is explicit, you can physically rotate the paper and it looks correct still. 
 +====Picking k====   
 +  * k should be long enough so that most single-copy genome regions are unique 
 +  * L is read length 
 +  * number of kmers = L-k+1  
 +  * number of arcs = L-k (that is what gives us connectivity information,​ so we can't have k be too close to L) 
 +  * in case of sequencing error: the number of kmers affected is k (so especially in error prone reads you would want a smaller k)  
 +  * so we need to balance k being long enough for uniquesness and short enough for connectivity,​ plus take into account that with bigger k, more kmers are affected by a single sequencing error.  
 +  * preqc gives us a recommendation of k to use. But your assembler will have its own specifications,​ so you don't want to just use the recommended k blindly  
 +====What could possibly go wrong?​==== 
 +(See slides for diagrams referred to in this section.) 
 +  * Picture A: Sequencing error at the end of a read (very likely).  
 +  * Picture B: Sequencing error in the middle of a read 
 +  * Picture C: Repetitive elements of genome. There are multiple ways in and out of the middle section, but how do you know which corresponds with which?  
 +    * If you have reads that are long enough to span the whole area, you can keep track of which reads go through which paths 
 +    * Mate pair data: if the pairs map onto either side (span the repeat), that will disambiguate it
  
lecture_notes/04-17-2015.txt · Last modified: 2015/04/20 00:29 by sihussai