User Tools

Site Tools


lecture_notes:05-05-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
lecture_notes:05-05-2010 [2010/05/06 00:06]
svasili created
lecture_notes:05-05-2010 [2010/05/06 00:16]
svasili
Line 3: Line 3:
 ====== Overview ====== ====== Overview ======
  
-Kevin assembled H.pylori 454 data using Newbler mapping assembly. To fix the assembly from Newbler, Kevin made use of SOLiD mate-paired data. This is done on version 16 of H.pylori and it still has bugs. Kevin has a software that checks for inversions.+Kevin assembled H.pylori 454 data using Newbler mapping assembly. To fix the assembly from Newbler, Kevin made use of SOLiD mate-paired data. This is done on version 16 of H.pylori and it still has bugs. He is getting consistent results but there is a chance that everything might not be right.\\ 
 + 
 +Kevin has a software that checks for inversions.
  
 ====== Newbler output ====== ====== Newbler output ======
Line 19: Line 21:
   * trim9-good-length.hist has distribution of mate-pair reads.   * trim9-good-length.hist has distribution of mate-pair reads.
   * Good reads are assumed to be between 1500-3500, as beyond that interval the probability of noise is high.   * Good reads are assumed to be between 1500-3500, as beyond that interval the probability of noise is high.
-  * Clustered mate-paired reads as good reads and not good reads.+  * Clustered mate-paired reads as good readsand not good reads.
   * Not good reads are again classified into 3 bins - between chromosome and plasmid, bad reads, and within plasmids.   * Not good reads are again classified into 3 bins - between chromosome and plasmid, bad reads, and within plasmids.
   * Good reads which are interesting are within chromosome.   * Good reads which are interesting are within chromosome.
Line 30: Line 32:
   * Looking at the largest values is useful, as interesting things might be happening there.   * Looking at the largest values is useful, as interesting things might be happening there.
   * Spikes are interesting to investigate. Zoom in to figure out the spikes and extract the reads that fell in that region. ​   * Spikes are interesting to investigate. Zoom in to figure out the spikes and extract the reads that fell in that region. ​
-  * Showed example of how there is a 1000 base pair insert in the middle that needs to be removed. ​The reads were classified as good reads as they were in < 3500 interval.+  * Showed example of how there is a 1000 base pair insert in the middle that needs to be removed. ​Although the reads are bad with inserts, but they were classified as good reads since they were in < 3500 interval.
   * Gaps in the histogram represents the repeat regions.   * Gaps in the histogram represents the repeat regions.
   * How to find where the 1000 base insert is? Looking at the flanking regions of these repeats obtained from mapping.   * How to find where the 1000 base insert is? Looking at the flanking regions of these repeats obtained from mapping.
   * Histogram of 720k region shows two peaks shown as two dots. Zoomed in to look at the regions where the peaks were. By examining the regions, gaps were seen which can have the 1000 base insert.   * Histogram of 720k region shows two peaks shown as two dots. Zoomed in to look at the regions where the peaks were. By examining the regions, gaps were seen which can have the 1000 base insert.
   * Rechecked this result by working at the unused data.    * Rechecked this result by working at the unused data. 
-  * There were 3 more regions ​were found to have this 1000 base insert.+  * There were 3 more regions found to have this 1000 base insert.
   * Overall 7 peaks were observed, 2 being homologous ends, 4 having 1000 base insert, and 1 still needs to be figured out.   * Overall 7 peaks were observed, 2 being homologous ends, 4 having 1000 base insert, and 1 still needs to be figured out.
  
Line 45: Line 47:
   * This region could be a misplaced piece and needs to be moved.   * This region could be a misplaced piece and needs to be moved.
  
-====== Volunteers for next weeks presentations ======+====== Volunteers for next week'​s ​presentations ======
  
   - John Kim   - John Kim
-  - Shyamini ​Vasili +  - Shyamini 
-  - Jenny Draper +  - Jenny 
-  - Jeff Long+  - Jeff
   - Thomas   - Thomas
-  - Michael ​Cusack+  - Michael
  
 There is no class on Friday May 14th, due to time conflict with Graduate Research Symposium (2-5pm). There is no class on Friday May 14th, due to time conflict with Graduate Research Symposium (2-5pm).
   ​   ​
  
lecture_notes/05-05-2010.txt · Last modified: 2010/05/06 00:16 by svasili