User Tools

Site Tools


lecture_notes:04-16-2010

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
lecture_notes:04-16-2010 [2010/04/17 00:30]
svasili
lecture_notes:04-16-2010 [2010/04/19 18:41]
jlong Mapping code, .rdb code for Makefile, 454NewblerMetrics
Line 13: Line 13:
   * Currently, tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.   * Currently, tools are installed under /​campusdata/​BME235/​bin/​old_Newbler/​.
   * Tools with prefix "​gs"​ are not supposed to be run directly.   * Tools with prefix "​gs"​ are not supposed to be run directly.
-  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currently these scripts do not work as on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6.+  * Kevin has written several scripts in Python (version 2.6) which aid in building and analyzing genomes. Currentlythese scripts do not work on Campusrocks,​ as the version of Python installed is 2.4 and it is under the process of being updated to version 2.6.
   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.   * Newbler assembly tools take .sff (color space and quality data) files as input and converts them into .fna (fasta file with nucleotide information) files.
   * Good only with 454 data, and is not good on reads with length < 50.    * Good only with 454 data, and is not good on reads with length < 50. 
Line 27: Line 27:
   * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.   * Output : Generated in a separate directory called "​assembly"​. Main outputs - .fna files and .qual files. Look at "/​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/​assembly"​.
   * make.log - keeps track of what happened.   * make.log - keeps track of what happened.
 +  * Mapping to an existing genome, an example from Kevin Karplus /​pluck/​Vc/​map23_scaffold 
 +    <​code>​ 
 +newMapping . 
 +addRun . /​projects/​lowelab/​users/​course/​karplus/​Vc/​sequencing/​sff/​*.sff 
 +setRef . Vc.scaffold 
 +runProject -e 25 -rst 0 -noace .  
 +</​code>​ 
 +  * Where -e 25 is specific to the Vibrio sequence coverage, and -noace prevents the building of an ace file (large file) which is used with CONSED. 
 +  * Slug is AT-rich, so Illumina data may be better than 454. 
 +  * rdb files were described as useful simple to create relational databases. ​ An example of rdb file generation with a makefile is given below as implemented by Kevin'​s in /​pluck/​rachel/​combined_cleaning1/​Makefile . 
 +<​code>​ 
 +%.stats: %.ids  
 + echo "​name length numreads"​ > $@ 
 + echo "​S N N"​ >> $@ 
 + grep '​^>'​ < contigs_all.fa \ 
 + | grep -f $*.ids \ 
 + | sed 's/=/ /g' \ 
 + | sed '​s/>//'​ \ 
 + | awk '​{printf "​%s\t%d\t%d\n",​ $$1, $$3, $$5}' \ 
 + >> $@ 
 +</​code>​ 
 +  * If anyone finds good user based documentation or tutorials versus feature based documentation,​ please share them with the group. 
 +  * Don't copy sfffiles use soft links to data files. 
 +  * Useful output cam be found in /​assembly/​454NewblerMetrics.txt .  The inputs, reads, bases (to calculate coverage= bases/ genome size), readAlignmentResults,​ inferredReadError (0.8%= OK), estimatedGenomeSize,​ consesusResults (largeContigMetrics,​ allContigs, ...) 
 +    ​
 ====== Things to remember while running assembly tools ====== ====== Things to remember while running assembly tools ======
   ​   ​
lecture_notes/04-16-2010.txt · Last modified: 2010/04/20 03:09 by karplus