====== Lecture Notes for May 28, 2010 ====== ===== Topics ===== - High level polishing of genome with Newbler \\ - Description of scripts in ~karplus/pluck/ dir ===== Notes ===== === ~karplus/pluck/Hp/ === Map separate dirs \\ New mapping instead of new assembly \\ -rst (repeat score threshold) stands for when you have multiple places where things could map, when do you consider it to be unique and when it is just a repeat you cannot handle \\ -rst 0 was used, is not the default \\ Played around with different ways of mapping \\ Used in makefile for postmapping: \\ * megablast, blastn: never look at it now * blat, blat-strict-match: do use and look at Mapping subdirectory with a lot of stuff in it: \\ Look at: ./454NewblerMetrics \\ * Tells how many reads and bases and how many reads managed to map * In specific example, not many managed to map, same sort of statistics like the //de novo// assembly * Look at bottom line first for allContigMetrics to see how the mapping performed * Sometimes works well, sometimes doesn't work well at all, depends if organism undergoes high mutation rate === ~karplus/pluck/Hp/map_separate_is607/ === More recent mapping \\ Plasmid cleaned up but not the transposon \\ Transposon typical length 2.5k \\ About 2,000 of the reads mapped to the transposon \\ 4% read error means there is a 4% difference between the NCBI version of the transposon and what is being mapped to, not necessarily a 4% read error in the sequencing \\ consensusAccuracy is 93%, usually higher as 98%-99%, may be that the transposon is highly variable \\ ./454ReadStatus.txt \\ Tells you how each read was handled \\ Status of reads: Unmapped, Full, Partial, Repeat, Chimeric (across different sequences) \\ Some mapped partially to the transposon (e.g. at end of transposon and on chromosome) \\ * Could have been partially mapped in the middle, would have been an incorrect mapping ./454AllContigs.fma \\ * Actual completed mapping ./454AllContigs.qual \\ * Check to see the quality of the mapping === ~karplus/pluck/Hp/map_separate_is607_2/ === Second attempt at mapping \\ Similar error rate as first attempt \\ Looks like things are mapping properly \\ Low error rate as 454 provides cleaner data for Newbler to map \\ ./is607.sff \\ * Mapped to the assembly, at least 30% match ./nois607.sff \\ * Reads that did not map === ~karplus/pluck/Hp/chrom_assembly24/ === Started over removing the plasmids and transposon \\ Mapping contigs of v24 to v19 assembly \\ Worried that previous assembly may have bugs in it (e.g. miss a path because previous path from old assembly is still there, may miss another path) \\ Did it without looking at previous work that Jenny had done, to be confident that had independently arrived at same conclusion \\ Went from 54 to 51 contigs \\ Did more through job of cleaning out the plasmid this assembly \\ One things should notice that is there is a large variation of reads per base \\ * Some are really short, so a lot overlaps there * Should be reads per base + an amount to compensate If there are big spikes, suspect that they may be parasites \\ * e.g. transposases, integrases, etc. * In this case, just repeat mappings === ~karplus/pluck/Hp/chrom_assembly24/ === trim0 = have to match all bases in reads \\ trimX = have to match all-X bases \\ trim9 = good compromise this time between too many matches in mapping vs. too few \\ ./trim9.out \\ * Trim to 15-mers from 24-mers * ./Makefile * --min_SNP = report SNP if is found above X times * --min_length = min length for * --peak_length = peak length for * --merge_cross = put in trim9_cross.rdb if * --supress_reads = suppress reads that have Look at your data, don't just accept the program output. See if it matches your expectations and investigate anything that looks suspicious. \\ === ~karplus/pluck/Hp/chrom_assembly26/ === ./v26.sticher \\ * File that was semi-automatically generated that joins the contigs together