User Tools

Site Tools


This is an old revision of the document!

A PCRE internal error occured. This might be caused by a faulty plugin

=====Discovar Team 5 Update===== Assembler takes only Illumina libraries: Ideally, PCR free, high coverage and insert size ~450bp Had to use Fastuniq to remove duplicates: 16X coverage being inputted, somewhat low for what discovar wants as input running dissovar: frac option to limit input of files to only porition of reads key to specify threads and max memory for the run 50% UCSF run showed much better results in N50 for contig and scafoold than 50% run and used less memory discovar performed much better with 2x250 reads vs 2x100 reads more scaffolds of longer length want to use more data when there is more RAM available 8th longest scaffold when nucleotide BLASTed matched a transcript variant of sea hare metallothionein hit may be result of having cysteine rich scafoold most common gene hit was robsomoal subunit 28S, good sign bc consistent across species look at runnign PRICE to find viral sequences that were found with blast would create an assembly for the viral sequnce that was found determine if sequence was integrated in the genome or are extranuclear can map contigs to scaffolds to see if any contig has a different coverage than normal coverage SSpace to do scaffolding after getting contigs their scaffolds and contigs had been coming out identical sequences 50% UCSF contigs as input using SW041 and SW042 files run with old BWA 0.5, will re-run with bwa 0.7 version merged a few scaffolds, but only added more Ns no schange in scaffold N50 only affected shorter contigs number of scaffolds decreased by 20-50 probably due to not enough coverage of the assembly mitochondion assembly looked for contig that might have been mitochondrial (previous class iteration) took reads that mapped to the 2012 consensu ssequence Hiseq w018 and sw019 reads so far mito size 14kb estiamte used discovar sw018 data that mapped to 2012 seq-> coverage 60X price sw018 reads that mapped to 2012 mito seq-> wants contig built from read data rather than scaffold statrt with one contig that maps well to mito (use 12kb discovar 18+19 output) mito genome does integrate into nuclear genome, over time mutates and changes sequence, results in lots of ambiguity in contig construction seems like 12kb contig is entire mitochondria genome 2nd largest contig (3245bp) looks like might be missing part of the mito look at ends of contigs and compare, try to join Ns together sea hare is 14kb, usually doesnt include hvr that is very difficult to assemble

You could leave a comment if you were logged in.
lecture_notes/05-20-2015.1432237535.txt.gz · Last modified: 2015/05/21 19:45 by nsaremi