This shows you the differences between two versions of the page.
lecture_notes:05-20-2015 [2015/05/21 19:45] nsaremi created |
lecture_notes:05-20-2015 [2015/05/21 20:47] (current) nsaremi |
||
---|---|---|---|
Line 10: | Line 10: | ||
- | running dissovar: | + | ====Running Discovar==== |
- | frac option to limit input of files to only porition of reads | + | Used fraction option limit input of files to only portion of reads |
- | key to specify threads and max memory for the run | + | Needed to specify threads and maximum memory for the run as well |
- | 50% UCSF run showed much better results in N50 for contig and scafoold than 50% run and used less memory | + | 50% UCSF run showed much better results in N50 for contig and scaffold than 50% original data run and used less memory |
- | discovar performed much better with 2x250 reads vs 2x100 reads | + | Discovar performed much better with 2x250 reads vs 2x100 reads; more scaffolds of longer length |
- | more scaffolds of longer length | + | |
- | want to use more data when there is more RAM available | + | Want to use full data set when there is more RAM available |
+ | |||
+ | ====BLAST results==== | ||
8th longest scaffold when nucleotide BLASTed matched a transcript variant of sea hare | 8th longest scaffold when nucleotide BLASTed matched a transcript variant of sea hare | ||
- | metallothionein hit may be result of having cysteine rich scafoold | + | metallothionein hit may be result of having cysteine rich scaffold |
- | most common gene hit was robsomoal subunit 28S, good sign bc consistent across species | + | most common gene hit was ribosomal subunit 28S, which is a good sign because this gene is consistent across species |
+ | Want to run PRICE to find viral sequences that were found with blast | ||
- | look at runnign PRICE to find viral sequences that were found with blast | + | would create an assembly for the viral sequnce that was found and determine if sequence was integrated in the genome or are extranuclear |
- | would create an assembly for the viral sequnce that was found | + | |
- | determine if sequence was integrated in the genome or are extranuclear | + | |
- | can map contigs to scaffolds to see if any contig has a different coverage than normal coverage | + | Can map contigs to scaffolds to see if any contig has a different coverage than normal coverage |
+ | |||
+ | ====SSpace==== | ||
SSpace to do scaffolding after getting contigs | SSpace to do scaffolding after getting contigs | ||
- | their scaffolds and contigs had been coming out identical sequences | ||
- | 50% UCSF contigs as input | ||
- | using SW041 and SW042 files | ||
- | run with old BWA 0.5, will re-run with bwa 0.7 version | ||
- | merged a few scaffolds, but only added more Ns | ||
- | no schange in scaffold N50 | + | Scaffolds and contigs had been coming out identical sequences |
- | only affected shorter contigs | + | |
- | number of scaffolds decreased by 20-50 | + | used 50% UCSF contigs as input, using SW041 and SW042 files |
+ | |||
+ | run with old BWA 0.5, will re-run with bwa 0.7 version | ||
+ | |||
+ | SSpace merged a few scaffolds, but only added more Ns | ||
+ | |||
+ | no change in scaffold N50 | ||
+ | only affected shorter contigs | ||
+ | number of scaffolds decreased by 20-50 | ||
+ | |||
+ | probably due to not enough coverage of the assembly | ||
- | probably due to not enough coverage of the assembly | ||
+ | ====mitochondrion assembly==== | ||
- | mitochondion assembly | + | Looked for contig that might have been mitochondrial (previous class iteration) |
+ | Took reads that mapped to the 2012 consensus sequence | ||
+ | Hiseq w018 and sw019 reads so far | ||
+ | mito size 14kb estiamte | ||
+ | used discovar sw018 data that mapped to 2012 seq-> coverage 60X | ||
- | looked for contig that might have been mitochondrial (previous class iteration) | ||
- | took reads that mapped to the 2012 consensu ssequence | ||
- | Hiseq w018 and sw019 reads so far | ||
- | mito size 14kb estiamte | ||
- | used discovar sw018 data that mapped to 2012 seq-> coverage 60X | ||
- | price sw018 reads that mapped to 2012 mito seq-> | ||
- | wants contig built from read data rather than scaffold | + | Want to use contigs built from read data rather than scaffold |
- | statrt with one contig that maps well to mito (use 12kb discovar 18+19 output) | + | start with one contig that maps well to mito (use 12kb discovar 18+19 output) |
mito genome does integrate into nuclear genome, over time mutates and changes sequence, results in lots of ambiguity in contig construction | mito genome does integrate into nuclear genome, over time mutates and changes sequence, results in lots of ambiguity in contig construction | ||
Line 68: | Line 73: | ||
look at ends of contigs and compare, try to join Ns together | look at ends of contigs and compare, try to join Ns together | ||
- | sea hare is 14kb, usually doesnt include hvr that is very difficult to assemble | + | sea hare is 14kb, usually doesnt include hypervariable region that is very difficult to assemble |