User Tools

Site Tools


lecture_notes:04-21-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
lecture_notes:04-21-2010 [2010/04/22 05:12]
hyjkim created
lecture_notes:04-21-2010 [2015/09/03 01:17] (current)
104.144.27.91 ↷ Links adapted because of a move operation
Line 5: Line 5:
     * Expect 40-50 different resources (papers, books, phd theses)     * Expect 40-50 different resources (papers, books, phd theses)
     * The bibliography must be annotated, though annotations are not required this friday.     * The bibliography must be annotated, though annotations are not required this friday.
 +    * As you run assembly tool, keep the [[archive:​computer_resources:​assemblies|assemblies/​]] wiki page updated!
  
 ==== Continuation of Newbler Assembler methods ==== ==== Continuation of Newbler Assembler methods ====
Line 12: Line 13:
   * Newbler-clean   * Newbler-clean
     * Not attempting to assemble data.     * Not attempting to assemble data.
-    * Used to remove contamination for H. Pylori ​whihc was sequenced in the same run. +    * Used to remove contamination for H. Pylori ​which was sequenced ​on the same machine ​in the same run. 
-    * If you know your contaminants,​ you can try to clean them out computationally.+    * If you have a reference genome for your contaminants,​ you can try to clean them out computationally ​by mapping and removing matches.
     * Makefile Overview     * Makefile Overview
       * Set up a new mapping       * Set up a new mapping
-      * Maps reads from 454 Pog data to the contaminat ​genome +      * Maps reads from 454 Pog data to the contaminant ​genome 
-    * Interested in ReadStatus.txt ​file +    * Interesting output: ​ReadStatus.txt 
-      * Contains status of each read (Mapped, unmapped, partially mapped, too short)+      * Contains ​mapping ​status of each read (Mapped, unmapped, partially mapped, too short)
       * Reads mapped to contaminants should be filtered out from 454 data and should not be sued for assembly.       * Reads mapped to contaminants should be filtered out from 454 data and should not be sued for assembly.
-    * Create a new SFF file with unmapped data in file "​no_Hyp.sff"​+    * Create a new SFF file (using sfffile utility) ​with unmapped data in file "​no_Hyp.sff"​
     * It may be beneficial to hang onto "too short" reads for use in assemblers which utilize shorter reads.     * It may be beneficial to hang onto "too short" reads for use in assemblers which utilize shorter reads.
-    * By removing H. Pylori, Newbler assembly removed a contig. 
     * You can also use Megablast and NCBI's Taxonomy Report to identify contaminants.     * You can also use Megablast and NCBI's Taxonomy Report to identify contaminants.
-  * Newbler-assembly3 didn't work +  * Newbler-assembly2 
-    * Changed expected coverage to 60x-- This number is closer to the final coverage. +    * Uses clean data (no H. Pylori) from newbler-clean1 
-    * Still reported 41 contigs+    * One less small contig, mostly the same 
 +  * Newbler ​   *  
 +    * Changed expected coverage to 60x-- This number is closer to the real coverage ​based on better estimates of genome size
 +    * Still reported 41 contigs ​(or maybe one less?) 
 +    * good news- All contigs map to reference genome
   * Map-colorspace3 (Map-colorspace directories begin indexing at 3 rather than 1)   * Map-colorspace3 (Map-colorspace directories begin indexing at 3 rather than 1)
     * The scripts ran in this directory were originally intended for finding inverseions from mate-pair reads.     * The scripts ran in this directory were originally intended for finding inverseions from mate-pair reads.
-    * New features have beed added since then.+    * New features have been added since then.
     * Now looks for reads between contigs     * Now looks for reads between contigs
-  ​* Newbler-partial3 attempts to use partially mapped reads to join contigs+    * Tries to orient contigs 
 +  ​* Newbler-partial3 
 +    * attempts to use only partially mapped/​unmapped ​reads 
 +    * Plan is to later map contigs from this assembly ​to the full contigs ​from before (extend edges?)
     * Megablast, blastn, blat, find dna differences are four methods for mapping partially mapped reads onto contigs created by newbler.     * Megablast, blastn, blat, find dna differences are four methods for mapping partially mapped reads onto contigs created by newbler.
       * Megablast and blastn showed very similar results       * Megablast and blastn showed very similar results
Line 47: Line 54:
       * Find dna differences       * Find dna differences
         * Shows differences between contigs and a reference genome in a human readable format         * Shows differences between contigs and a reference genome in a human readable format
-  * Newbler-assembly4 ​failed +  * Newbler-assembly4 
-    * Used partially mapped reads to form contigs+    * Adds partial3 ​contigs ​to full search as reads
     * Didn't seem to help much     * Didn't seem to help much
-    * The resulting output had fewer total bases and more contigs than originally. +    * The resulting output had fewer total bases and more contigs than originally ​- perhaps worse
-  * newbler-assembly5+  * Newbler-assembly5
     * Utilized Sanger reads produced by David Bernick to join contigs     * Utilized Sanger reads produced by David Bernick to join contigs
     * 45 Sanger reads total     * 45 Sanger reads total
Line 69: Line 76:
         * 1/3 0.32 reads/base was a short contig and was within the expected deviation in reads/base from a sequencing run         * 1/3 0.32 reads/base was a short contig and was within the expected deviation in reads/base from a sequencing run
         * 0.44 reads/base occured twice, not three times         * 0.44 reads/base occured twice, not three times
 +    * Less contigs and more bases 
   * Map-colorspace5   * Map-colorspace5
-    * Makefile ​has many parameters. +    * map-colorspace ​has many parameters. 
-      * You can list all the parameters by issuing the command "mapcolorspace ​--help"​+      * You can list all the parameters by issuing the command "map-colorspace ​--help"​
     * Lots of output files using this command (~2 per contig). May prove problematic with assemblies that contain many contigs     * Lots of output files using this command (~2 per contig). May prove problematic with assemblies that contain many contigs
     * Set length parameters using a histogram of reads mapped to contigs     * Set length parameters using a histogram of reads mapped to contigs
Line 85: Line 93:
       * Shows that contigs are near, not touching.       * Shows that contigs are near, not touching.
       * short contigs may be skipped in paired end reads       * short contigs may be skipped in paired end reads
-    * QUESTION: Can we make a single connected graph using this data and all possible paths?+    * QUESTION: Can we make a single connected graph using this data and all possible paths? ​-sometimes if data is particularly coherent. not guaranteed.
lecture_notes/04-21-2010.1271913130.txt.gz · Last modified: 2010/04/22 05:12 by hyjkim