This shows you the differences between two versions of the page.
|
lecture_notes:05-20-2015 [2015/05/21 19:45] nsaremi created |
lecture_notes:05-20-2015 [2015/05/21 20:47] (current) nsaremi |
||
|---|---|---|---|
| Line 10: | Line 10: | ||
| - | running dissovar: | + | ====Running Discovar==== |
| - | frac option to limit input of files to only porition of reads | + | Used fraction option limit input of files to only portion of reads |
| - | key to specify threads and max memory for the run | + | Needed to specify threads and maximum memory for the run as well |
| - | 50% UCSF run showed much better results in N50 for contig and scafoold than 50% run and used less memory | + | 50% UCSF run showed much better results in N50 for contig and scaffold than 50% original data run and used less memory |
| - | discovar performed much better with 2x250 reads vs 2x100 reads | + | Discovar performed much better with 2x250 reads vs 2x100 reads; more scaffolds of longer length |
| - | more scaffolds of longer length | + | |
| - | want to use more data when there is more RAM available | + | Want to use full data set when there is more RAM available |
| + | |||
| + | ====BLAST results==== | ||
| 8th longest scaffold when nucleotide BLASTed matched a transcript variant of sea hare | 8th longest scaffold when nucleotide BLASTed matched a transcript variant of sea hare | ||
| - | metallothionein hit may be result of having cysteine rich scafoold | + | metallothionein hit may be result of having cysteine rich scaffold |
| - | most common gene hit was robsomoal subunit 28S, good sign bc consistent across species | + | most common gene hit was ribosomal subunit 28S, which is a good sign because this gene is consistent across species |
| + | Want to run PRICE to find viral sequences that were found with blast | ||
| - | look at runnign PRICE to find viral sequences that were found with blast | + | would create an assembly for the viral sequnce that was found and determine if sequence was integrated in the genome or are extranuclear |
| - | would create an assembly for the viral sequnce that was found | + | |
| - | determine if sequence was integrated in the genome or are extranuclear | + | |
| - | can map contigs to scaffolds to see if any contig has a different coverage than normal coverage | + | Can map contigs to scaffolds to see if any contig has a different coverage than normal coverage |
| + | |||
| + | ====SSpace==== | ||
| SSpace to do scaffolding after getting contigs | SSpace to do scaffolding after getting contigs | ||
| - | their scaffolds and contigs had been coming out identical sequences | ||
| - | 50% UCSF contigs as input | ||
| - | using SW041 and SW042 files | ||
| - | run with old BWA 0.5, will re-run with bwa 0.7 version | ||
| - | merged a few scaffolds, but only added more Ns | ||
| - | no schange in scaffold N50 | + | Scaffolds and contigs had been coming out identical sequences |
| - | only affected shorter contigs | + | |
| - | number of scaffolds decreased by 20-50 | + | used 50% UCSF contigs as input, using SW041 and SW042 files |
| + | |||
| + | run with old BWA 0.5, will re-run with bwa 0.7 version | ||
| + | |||
| + | SSpace merged a few scaffolds, but only added more Ns | ||
| + | |||
| + | no change in scaffold N50 | ||
| + | only affected shorter contigs | ||
| + | number of scaffolds decreased by 20-50 | ||
| + | |||
| + | probably due to not enough coverage of the assembly | ||
| - | probably due to not enough coverage of the assembly | ||
| + | ====mitochondrion assembly==== | ||
| - | mitochondion assembly | + | Looked for contig that might have been mitochondrial (previous class iteration) |
| + | Took reads that mapped to the 2012 consensus sequence | ||
| + | Hiseq w018 and sw019 reads so far | ||
| + | mito size 14kb estiamte | ||
| + | used discovar sw018 data that mapped to 2012 seq-> coverage 60X | ||
| - | looked for contig that might have been mitochondrial (previous class iteration) | ||
| - | took reads that mapped to the 2012 consensu ssequence | ||
| - | Hiseq w018 and sw019 reads so far | ||
| - | mito size 14kb estiamte | ||
| - | used discovar sw018 data that mapped to 2012 seq-> coverage 60X | ||
| - | price sw018 reads that mapped to 2012 mito seq-> | ||
| - | wants contig built from read data rather than scaffold | + | Want to use contigs built from read data rather than scaffold |
| - | statrt with one contig that maps well to mito (use 12kb discovar 18+19 output) | + | start with one contig that maps well to mito (use 12kb discovar 18+19 output) |
| mito genome does integrate into nuclear genome, over time mutates and changes sequence, results in lots of ambiguity in contig construction | mito genome does integrate into nuclear genome, over time mutates and changes sequence, results in lots of ambiguity in contig construction | ||
| Line 68: | Line 73: | ||
| look at ends of contigs and compare, try to join Ns together | look at ends of contigs and compare, try to join Ns together | ||
| - | sea hare is 14kb, usually doesnt include hvr that is very difficult to assemble | + | sea hare is 14kb, usually doesnt include hypervariable region that is very difficult to assemble |