This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
lecture_notes:04-21-2010 [2010/04/22 05:12] hyjkim created |
lecture_notes:04-21-2010 [2015/09/03 01:17] (current) 104.144.27.91 ↷ Links adapted because of a move operation |
||
---|---|---|---|
Line 5: | Line 5: | ||
* Expect 40-50 different resources (papers, books, phd theses) | * Expect 40-50 different resources (papers, books, phd theses) | ||
* The bibliography must be annotated, though annotations are not required this friday. | * The bibliography must be annotated, though annotations are not required this friday. | ||
+ | * As you run assembly tool, keep the [[archive:computer_resources:assemblies|assemblies/]] wiki page updated! | ||
==== Continuation of Newbler Assembler methods ==== | ==== Continuation of Newbler Assembler methods ==== | ||
Line 12: | Line 13: | ||
* Newbler-clean | * Newbler-clean | ||
* Not attempting to assemble data. | * Not attempting to assemble data. | ||
- | * Used to remove contamination for H. Pylori whihc was sequenced in the same run. | + | * Used to remove contamination for H. Pylori which was sequenced on the same machine in the same run. |
- | * If you know your contaminants, you can try to clean them out computationally. | + | * If you have a reference genome for your contaminants, you can try to clean them out computationally by mapping and removing matches. |
* Makefile Overview | * Makefile Overview | ||
* Set up a new mapping | * Set up a new mapping | ||
- | * Maps reads from 454 Pog data to the contaminat genome | + | * Maps reads from 454 Pog data to the contaminant genome |
- | * Interested in ReadStatus.txt file | + | * Interesting output: ReadStatus.txt |
- | * Contains status of each read (Mapped, unmapped, partially mapped, too short) | + | * Contains mapping status of each read (Mapped, unmapped, partially mapped, too short) |
* Reads mapped to contaminants should be filtered out from 454 data and should not be sued for assembly. | * Reads mapped to contaminants should be filtered out from 454 data and should not be sued for assembly. | ||
- | * Create a new SFF file with unmapped data in file "no_Hyp.sff" | + | * Create a new SFF file (using sfffile utility) with unmapped data in file "no_Hyp.sff" |
* It may be beneficial to hang onto "too short" reads for use in assemblers which utilize shorter reads. | * It may be beneficial to hang onto "too short" reads for use in assemblers which utilize shorter reads. | ||
- | * By removing H. Pylori, Newbler assembly removed a contig. | ||
* You can also use Megablast and NCBI's Taxonomy Report to identify contaminants. | * You can also use Megablast and NCBI's Taxonomy Report to identify contaminants. | ||
- | * Newbler-assembly3 didn't work | + | * Newbler-assembly2 |
- | * Changed expected coverage to 60x-- This number is closer to the final coverage. | + | * Uses clean data (no H. Pylori) from newbler-clean1 |
- | * Still reported 41 contigs | + | * One less small contig, mostly the same |
+ | * Newbler * | ||
+ | * Changed expected coverage to 60x-- This number is closer to the real coverage based on better estimates of genome size. | ||
+ | * Still reported 41 contigs (or maybe one less?) | ||
+ | * good news- All contigs map to reference genome | ||
* Map-colorspace3 (Map-colorspace directories begin indexing at 3 rather than 1) | * Map-colorspace3 (Map-colorspace directories begin indexing at 3 rather than 1) | ||
* The scripts ran in this directory were originally intended for finding inverseions from mate-pair reads. | * The scripts ran in this directory were originally intended for finding inverseions from mate-pair reads. | ||
- | * New features have beed added since then. | + | * New features have been added since then. |
* Now looks for reads between contigs | * Now looks for reads between contigs | ||
- | * Newbler-partial3 attempts to use partially mapped reads to join contigs | + | * Tries to orient contigs |
+ | * Newbler-partial3 | ||
+ | * attempts to use only partially mapped/unmapped reads | ||
+ | * Plan is to later map contigs from this assembly to the full contigs from before (extend edges?) | ||
* Megablast, blastn, blat, find dna differences are four methods for mapping partially mapped reads onto contigs created by newbler. | * Megablast, blastn, blat, find dna differences are four methods for mapping partially mapped reads onto contigs created by newbler. | ||
* Megablast and blastn showed very similar results | * Megablast and blastn showed very similar results | ||
Line 47: | Line 54: | ||
* Find dna differences | * Find dna differences | ||
* Shows differences between contigs and a reference genome in a human readable format | * Shows differences between contigs and a reference genome in a human readable format | ||
- | * Newbler-assembly4 failed | + | * Newbler-assembly4 |
- | * Used partially mapped reads to form contigs | + | * Adds partial3 contigs to full search as reads |
* Didn't seem to help much | * Didn't seem to help much | ||
- | * The resulting output had fewer total bases and more contigs than originally. | + | * The resulting output had fewer total bases and more contigs than originally - perhaps worse. |
- | * newbler-assembly5 | + | * Newbler-assembly5 |
* Utilized Sanger reads produced by David Bernick to join contigs | * Utilized Sanger reads produced by David Bernick to join contigs | ||
* 45 Sanger reads total | * 45 Sanger reads total | ||
Line 69: | Line 76: | ||
* 1/3 0.32 reads/base was a short contig and was within the expected deviation in reads/base from a sequencing run | * 1/3 0.32 reads/base was a short contig and was within the expected deviation in reads/base from a sequencing run | ||
* 0.44 reads/base occured twice, not three times | * 0.44 reads/base occured twice, not three times | ||
+ | * Less contigs and more bases | ||
* Map-colorspace5 | * Map-colorspace5 | ||
- | * Makefile has many parameters. | + | * map-colorspace has many parameters. |
- | * You can list all the parameters by issuing the command "mapcolorspace --help" | + | * You can list all the parameters by issuing the command "map-colorspace --help" |
* Lots of output files using this command (~2 per contig). May prove problematic with assemblies that contain many contigs | * Lots of output files using this command (~2 per contig). May prove problematic with assemblies that contain many contigs | ||
* Set length parameters using a histogram of reads mapped to contigs | * Set length parameters using a histogram of reads mapped to contigs | ||
Line 85: | Line 93: | ||
* Shows that contigs are near, not touching. | * Shows that contigs are near, not touching. | ||
* short contigs may be skipped in paired end reads | * short contigs may be skipped in paired end reads | ||
- | * QUESTION: Can we make a single connected graph using this data and all possible paths? | + | * QUESTION: Can we make a single connected graph using this data and all possible paths? -sometimes if data is particularly coherent. not guaranteed. |