User Tools

Site Tools


lecture_notes:04-16-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-16-2010 [2010/04/17 00:22]
svasili
lecture_notes:04-16-2010 [2010/04/20 03:09] (current)
karplus fixed punctuation that was messing up final list
Line 1: Line 1:
-====== Newbler assembly ​of POG ======+====== Newbler assembly ​on POG ====== 
 ====== Overview ====== ====== Overview ======
 Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​ Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​
Line 12: Line 13:
   * Currently, tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.   * Currently, tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.
   * Tools with prefix "​gs"​ are not supposed to be run directly.   * Tools with prefix "​gs"​ are not supposed to be run directly.
-  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currently these scripts do not work as on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6.+  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currentlythese scripts do not work on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6. (Python2.6.5 has now been installed in /​campusdata/​BME235/​bin/ ​ --- //​[[karplus@soe.ucsc.edu|Kevin Karplus]] 2010/04/19 20:03//)
   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.
   * Good only with 454 data, and is not good on reads with length < 50.    * Good only with 454 data, and is not good on reads with length < 50. 
Line 26: Line 27:
   * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.   * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.
   * make.log - keeps track of what happened.   * make.log - keeps track of what happened.
 +  * Mapping to an existing genome, an example from Kevin Karplus /​pluck/​Vc/​map23_scaffold
 +    <​code>​
 +newMapping .
 +addRun . /​projects/​lowelab/​users/​course/​karplus/​Vc/​sequencing/​sff/​*.sff
 +setRef . Vc.scaffold
 +runProject -e 25 -rst 0 -noace . 
 +</​code>​
 +  * Where -e 25 is specific to the Vibrio sequence coverage, and -noace prevents the building of an ace file (large file) which is used with CONSED.
 +  * Slug is AT-rich, so Illumina data may be better than 454.
 +  * rdb files were described as useful simple to create relational databases. ​ An example of rdb file generation with a makefile is given below as implemented by Kevin'​s in /​pluck/​rachel/​combined_cleaning1/​Makefile . Note that this example was **not** given in class, and is intended for pulling out a subset of the contigs, not making an rdb file for all contigs.
 +<​code>​
 +%.stats: %.ids
 + echo "​name length numreads"​ > $@
 + echo "​S N N"​ >> $@
 + grep '​^>'​ < contigs_all.fa \
 + | grep -f $*.ids \
 + | sed 's/=/ /g' \
 + | sed '​s/>//'​ \
 + | awk '​{printf "​%s\t%d\t%d\n",​ $$1, $$3, $$5}' \
 + >> $@
 +</​code>​
 +  * If anyone finds good user based documentation or tutorials versus feature based documentation,​ please share them with the group.
 +  * Don't copy sfffiles use soft links to data files.
 +  * Useful output cam be found in /​assembly/​454NewblerMetrics.txt .  The inputs, reads, bases (to calculate coverage= bases/ genome size), readAlignmentResults,​ inferredReadError (0.8%= OK), estimatedGenomeSize,​ consesusResults (largeContigMetrics,​ allContigs, ...)
 +    ​
 +====== Things to remember while running assembly tools ======
  
-====== Things to remember while running assembly tools====== +  ​* All the assemblies should be listed under /​campusdata/​BME235/​assemblies.  
-   +  * Include .cshrc file in your path.  
-  ​* All the assemblies should be listed under /​campusdata/​BME235/​assemblies. +  * Its better to run the tool in the current working directory.  
-  * Include .cshrc file in your path. +  * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool.  
-  * Its better to run the tool in the current working directory. +  * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​ ). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as "a book for lab protocols"​. 
-  * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool. +  * It is always better to say append to make.log in Makefile. ​
-  * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog//​newbler-assembly1/​). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as "a book for lab protocols"​.  +
-  * Its always better to say append to make.log in Makefile.+
   * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.   * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.
  
lecture_notes/04-16-2010.1271463779.txt.gz · Last modified: 2010/04/17 00:22 by svasili