User Tools

Site Tools


lecture_notes:04-16-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-16-2010 [2010/04/17 00:28]
svasili
lecture_notes:04-16-2010 [2010/04/20 03:09] (current)
karplus fixed punctuation that was messing up final list
Line 1: Line 1:
-====== Newbler assembly ​of POG ======+====== Newbler assembly ​on POG ====== 
 ====== Overview ====== ====== Overview ======
 Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​ Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​
Line 12: Line 13:
   * Currently, tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.   * Currently, tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.
   * Tools with prefix "​gs"​ are not supposed to be run directly.   * Tools with prefix "​gs"​ are not supposed to be run directly.
-  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currently these scripts do not work as on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6.+  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currentlythese scripts do not work on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6. (Python2.6.5 has now been installed in /​campusdata/​BME235/​bin/ ​ --- //​[[karplus@soe.ucsc.edu|Kevin Karplus]] 2010/04/19 20:03//)
   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.
   * Good only with 454 data, and is not good on reads with length < 50.    * Good only with 454 data, and is not good on reads with length < 50. 
Line 26: Line 27:
   * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.   * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.
   * make.log - keeps track of what happened.   * make.log - keeps track of what happened.
 +  * Mapping to an existing genome, an example from Kevin Karplus /​pluck/​Vc/​map23_scaffold 
 +    <​code>​ 
 +newMapping . 
 +addRun . /​projects/​lowelab/​users/​course/​karplus/​Vc/​sequencing/​sff/​*.sff 
 +setRef . Vc.scaffold 
 +runProject -e 25 -rst 0 -noace .  
 +</​code>​ 
 +  * Where -e 25 is specific to the Vibrio sequence coverage, and -noace prevents the building of an ace file (large file) which is used with CONSED. 
 +  * Slug is AT-rich, so Illumina data may be better than 454. 
 +  * rdb files were described as useful simple to create relational databases. ​ An example of rdb file generation with a makefile is given below as implemented by Kevin'​s in /​pluck/​rachel/​combined_cleaning1/​Makefile . Note that this example was **not** given in class, and is intended for pulling out a subset of the contigs, not making an rdb file for all contigs. 
 +<​code>​ 
 +%.stats: %.ids  
 + echo "​name length numreads"​ > $@ 
 + echo "​S N N"​ >> $@ 
 + grep '​^>'​ < contigs_all.fa \ 
 + | grep -f $*.ids \ 
 + | sed 's/=/ /g' \ 
 + | sed '​s/>//'​ \ 
 + | awk '​{printf "​%s\t%d\t%d\n",​ $$1, $$3, $$5}' \ 
 + >> $@ 
 +</​code>​ 
 +  * If anyone finds good user based documentation or tutorials versus feature based documentation,​ please share them with the group. 
 +  * Don't copy sfffiles use soft links to data files. 
 +  * Useful output cam be found in /​assembly/​454NewblerMetrics.txt .  The inputs, reads, bases (to calculate coverage= bases/ genome size), readAlignmentResults,​ inferredReadError (0.8%= OK), estimatedGenomeSize,​ consesusResults (largeContigMetrics,​ allContigs, ...) 
 +    ​
 ====== Things to remember while running assembly tools ====== ====== Things to remember while running assembly tools ======
-  ​ + 
-    * All the assemblies should be listed under /​campusdata/​BME235/​assemblies.  +  * All the assemblies should be listed under /​campusdata/​BME235/​assemblies.  
-    * Include .cshrc file in your path.  +  * Include .cshrc file in your path.  
-    * Its better to run the tool in the current working directory.  +  * Its better to run the tool in the current working directory.  
-    * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool.  +  * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool.  
-    * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog//​newbler-assembly1/​). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as "a book for lab protocols"​.  +  * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​ ). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as "a book for lab protocols"​. 
-    Its always better to say append to make.log in Makefile.  +  It is always better to say append to make.log in Makefile.  
-    * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.+  * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.
  
  
lecture_notes/04-16-2010.1271464098.txt.gz · Last modified: 2010/04/17 00:28 by svasili