User Tools

Site Tools


lecture_notes:04-16-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
lecture_notes:04-16-2010 [2010/04/17 00:16]
svasili
lecture_notes:04-16-2010 [2010/04/20 03:04]
karplus added python2.6.5
Line 1: Line 1:
-====== Newbler assembly ​of POG ======+====== Newbler assembly ​on POG ====== 
 ====== Overview ====== ====== Overview ======
 Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​ Outlines how Kevin assembled 454 data of Pyrobaculum oguniense (POG) using Newbler 2.3 version.  ​
Line 10: Line 11:
   * The README file in the directory contains important information about the assembly.   * The README file in the directory contains important information about the assembly.
   * Info about tools installed is listed in bioinformatic_tools [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]]. Info about how to run the De novo as well as Mapping assembly tools is also included there.   * Info about tools installed is listed in bioinformatic_tools [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]]. Info about how to run the De novo as well as Mapping assembly tools is also included there.
-  * Currently tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.+  * Currentlytools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.
   * Tools with prefix "​gs"​ are not supposed to be run directly.   * Tools with prefix "​gs"​ are not supposed to be run directly.
-  * Kevin has written several scripts in Python (version 2.6) which aids in building and analyzing genomes. Currently these scripts do not work as on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6.+  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currentlythese scripts do not work on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6. (Python2.6.5 has now been installed in /​campusdata/​BME235/​bin/ ​ --- //​[[karplus@soe.ucsc.edu|Kevin Karplus]] 2010/04/19 20:03//)
   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.
-  * Good only with 454 data, is not good on reads < 50bp.  +  * Good only with 454 data, and is not good on reads with length ​50.  
-  * Example to run the De novo tool on data is shown below. The code is taken from [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]].+  * Example ​code to run the De novo tool on data is shown below. The code is taken from [[https://​banana-slug.soe.ucsc.edu/​bioinformatic_tools:​gs_de_novo_assembler | GS De Novo Assembler]].
     <​code>​     <​code>​
 newAssembly . newAssembly .
Line 24: Line 25:
   * Where, -e 50 is an important parameter -> implies expected coverage and it defaults to 50.   * Where, -e 50 is an important parameter -> implies expected coverage and it defaults to 50.
   * Currently, De novo assembly is done on POG, Mapping is not done yet.   * Currently, De novo assembly is done on POG, Mapping is not done yet.
-  * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna file and .qual file. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.+  * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.
   * make.log - keeps track of what happened.   * make.log - keeps track of what happened.
 +  * Mapping to an existing genome, an example from Kevin Karplus /​pluck/​Vc/​map23_scaffold
 +    <​code>​
 +newMapping .
 +addRun . /​projects/​lowelab/​users/​course/​karplus/​Vc/​sequencing/​sff/​*.sff
 +setRef . Vc.scaffold
 +runProject -e 25 -rst 0 -noace . 
 +</​code>​
 +  * Where -e 25 is specific to the Vibrio sequence coverage, and -noace prevents the building of an ace file (large file) which is used with CONSED.
 +  * Slug is AT-rich, so Illumina data may be better than 454.
 +  * rdb files were described as useful simple to create relational databases. ​ An example of rdb file generation with a makefile is given below as implemented by Kevin'​s in /​pluck/​rachel/​combined_cleaning1/​Makefile . Note that this example was **not** given in class, and is intended for pulling out a subset of the contigs, not making an rdb file for all contigs.
 +<​code>​
 +%.stats: %.ids
 + echo "​name length numreads"​ > $@
 + echo "​S N N"​ >> $@
 + grep '​^>'​ < contigs_all.fa \
 + | grep -f $*.ids \
 + | sed 's/=/ /g' \
 + | sed '​s/>//'​ \
 + | awk '​{printf "​%s\t%d\t%d\n",​ $$1, $$3, $$5}' \
 + >> $@
 +</​code>​
 +  * If anyone finds good user based documentation or tutorials versus feature based documentation,​ please share them with the group.
 +  * Don't copy sfffiles use soft links to data files.
 +  * Useful output cam be found in /​assembly/​454NewblerMetrics.txt .  The inputs, reads, bases (to calculate coverage= bases/ genome size), readAlignmentResults,​ inferredReadError (0.8%= OK), estimatedGenomeSize,​ consesusResults (largeContigMetrics,​ allContigs, ...)
 +    ​
 +====== Things to remember while running assembly tools ======
  
-====== Things to remember while running assembly tools====== +  ​* All the assemblies should be listed under /​campusdata/​BME235/​assemblies.  
-   +  * Include .cshrc file in your path.  
-  ​* All the assemblies should be listed under /​campusdata/​BME235/​assemblies. +  * Its better to run the tool in the current working directory.  
-  * Include .cshrc file in your path. +  * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool.  
-  * Its better to run the tool in the current working directory. +  * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog//​newbler-assembly1/​). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as "a book for lab protocols".  
-  * Create a README file in each new directory and it should contain all the necessary stuff required to run the assembly tool. +  * Its always better to say append to make.log in Makefile. ​
-  * Create Makefile for each assembly tool. (Makefile for newbler_assembly tool is in /​campusdata/​BME235/​assemblies/​Pog//​newbler-assembly1/​). You can use it as a template and modify the data source and the expected coverage as required. Makefile should be considered as a book for lab protocols.  +
-  * It always better to say append to make.log in Makefile.+
   * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.   * Wiki page for assembly tools should contain a summary of how to run the tool and other things that might be useful to look at.
  
lecture_notes/04-16-2010.txt · Last modified: 2010/04/20 03:09 by karplus