User Tools

Site Tools


lecture_notes:04-16-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:04-16-2010 [2010/04/16 23:09]
svasili
lecture_notes:04-16-2010 [2010/04/20 03:09] (current)
karplus fixed punctuation that was messing up final list
Line 1: Line 1:
-====== Newbler assembly ​of POG ======+====== Newbler assembly ​on POG ====== 
 ====== Overview ====== ====== Overview ======
 Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​ Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​
Line 7: Line 8:
   * Kevin installed Newbler 2.3 version in Campusrocks cluster under /​campusdata/​BME235/​programs/​DataAnalysis_2.3.   * Kevin installed Newbler 2.3 version in Campusrocks cluster under /​campusdata/​BME235/​programs/​DataAnalysis_2.3.
   * Newbler GUI is not installed as it has some issues with unpacking.   * Newbler GUI is not installed as it has some issues with unpacking.
-  * All the assemblies should be listed under /​campusdata/​BME235/​assemblies. 
   * Kevin ran the assembly tool on POG 454 data under /​campusdata/​BME235/​assemblies/​Pog.   * Kevin ran the assembly tool on POG 454 data under /​campusdata/​BME235/​assemblies/​Pog.
   * The README file in the directory contains important information about the assembly.   * The README file in the directory contains important information about the assembly.
-  * Info about tools installed is listed in bioinformatic_tools [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]]. +  * Info about tools installed is listed in bioinformatic_tools [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]]. Info about how to run the De novo as well as Mapping assembly tools is also included there
-  * Currently tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.+  * Currentlytools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.
   * Tools with prefix "​gs"​ are not supposed to be run directly.   * Tools with prefix "​gs"​ are not supposed to be run directly.
 +  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currently, these scripts do not work on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6. (Python2.6.5 has now been installed in /​campusdata/​BME235/​bin/ ​ --- //​[[karplus@soe.ucsc.edu|Kevin Karplus]] 2010/04/19 20:03//)
 +  * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.
 +  * Good only with 454 data, and is not good on reads with length < 50. 
 +  * Example code to run the De novo tool on data is shown below. The code is taken from [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]].
 +    <​code>​
 +newAssembly .
 +addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff
 +addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff
 +runProject -e 50 .
 +</​code>​
 +  * Where, -e 50 is an important parameter -> implies expected coverage and it defaults to 50.
 +  * Currently, De novo assembly is done on POG, Mapping is not done yet.
 +  * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.
 +  * make.log - keeps track of what happened.
 +  * Mapping to an existing genome, an example from Kevin Karplus /​pluck/​Vc/​map23_scaffold
 +    <​code>​
 +newMapping .
 +addRun . /​projects/​lowelab/​users/​course/​karplus/​Vc/​sequencing/​sff/​*.sff
 +setRef . Vc.scaffold
 +runProject -e 25 -rst 0 -noace . 
 +</​code>​
 +  * Where -e 25 is specific to the Vibrio sequence coverage, and -noace prevents the building of an ace file (large file) which is used with CONSED.
 +  * Slug is AT-rich, so Illumina data may be better than 454.
 +  * rdb files were described as useful simple to create relational databases. ​ An example of rdb file generation with a makefile is given below as implemented by Kevin'​s in /​pluck/​rachel/​combined_cleaning1/​Makefile . Note that this example was **not** given in class, and is intended for pulling out a subset of the contigs, not making an rdb file for all contigs.
 +<​code>​
 +%.stats: %.ids
 + echo "​name length numreads"​ > $@
 + echo "​S N N"​ >> $@
 + grep '​^>'​ < contigs_all.fa \
 + | grep -f $*.ids \
 + | sed 's/=/ /g' \
 + | sed '​s/>//'​ \
 + | awk '​{printf "​%s\t%d\t%d\n",​ $$1, $$3, $$5}' \
 + >> $@
 +</​code>​
 +  * If anyone finds good user based documentation or tutorials versus feature based documentation,​ please share them with the group.
 +  * Don't copy sfffiles use soft links to data files.
 +  * Useful output cam be found in /​assembly/​454NewblerMetrics.txt .  The inputs, reads, bases (to calculate coverage= bases/ genome size), readAlignmentResults,​ inferredReadError (0.8%= OK), estimatedGenomeSize,​ consesusResults (largeContigMetrics,​ allContigs, ...)
 +    ​
 +====== Things to remember while running assembly tools ======
 +
 +  * All the assemblies should be listed under /​campusdata/​BME235/​assemblies. ​
 +  * Include .cshrc file in your path. 
 +  * Its better to run the tool in the current working directory. ​
 +  * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool. 
 +  * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​ ). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as "a book for lab protocols"​.
 +  * It is always better to say append to make.log in Makefile. ​
 +  * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.
 +
  
 + 
  
  
lecture_notes/04-16-2010.1271459365.txt.gz · Last modified: 2010/04/16 23:09 by svasili