=====Discovar Team 5 Update===== Assembler takes only Illumina libraries: Ideally, PCR free, high coverage and insert size ~450bp Had to use Fastuniq to remove duplicates: 16X coverage being inputted, somewhat low for what discovar wants as input ====Running Discovar==== Used fraction option limit input of files to only portion of reads Needed to specify threads and maximum memory for the run as well 50% UCSF run showed much better results in N50 for contig and scaffold than 50% original data run and used less memory Discovar performed much better with 2x250 reads vs 2x100 reads; more scaffolds of longer length Want to use full data set when there is more RAM available ====BLAST results==== 8th longest scaffold when nucleotide BLASTed matched a transcript variant of sea hare metallothionein hit may be result of having cysteine rich scaffold most common gene hit was ribosomal subunit 28S, which is a good sign because this gene is consistent across species Want to run PRICE to find viral sequences that were found with blast would create an assembly for the viral sequnce that was found and determine if sequence was integrated in the genome or are extranuclear Can map contigs to scaffolds to see if any contig has a different coverage than normal coverage ====SSpace==== SSpace to do scaffolding after getting contigs Scaffolds and contigs had been coming out identical sequences used 50% UCSF contigs as input, using SW041 and SW042 files run with old BWA 0.5, will re-run with bwa 0.7 version SSpace merged a few scaffolds, but only added more Ns no change in scaffold N50 only affected shorter contigs number of scaffolds decreased by 20-50 probably due to not enough coverage of the assembly ====mitochondrion assembly==== Looked for contig that might have been mitochondrial (previous class iteration) Took reads that mapped to the 2012 consensus sequence Hiseq w018 and sw019 reads so far mito size 14kb estiamte used discovar sw018 data that mapped to 2012 seq-> coverage 60X Want to use contigs built from read data rather than scaffold start with one contig that maps well to mito (use 12kb discovar 18+19 output) mito genome does integrate into nuclear genome, over time mutates and changes sequence, results in lots of ambiguity in contig construction seems like 12kb contig is entire mitochondria genome 2nd largest contig (3245bp) looks like might be missing part of the mito look at ends of contigs and compare, try to join Ns together sea hare is 14kb, usually doesnt include hypervariable region that is very difficult to assemble