Table of Contents

Lecture Notes for May 28, 2010

Topics

  1. High level polishing of genome with Newbler
  2. Description of scripts in ~karplus/pluck/ dir

Notes

~karplus/pluck/Hp/

Map separate dirs
New mapping instead of new assembly
-rst (repeat score threshold) stands for when you have multiple places where things could map, when do you consider it to be unique and when it is just a repeat you cannot handle
-rst 0 was used, is not the default
Played around with different ways of mapping
Used in makefile for postmapping:

Mapping subdirectory with a lot of stuff in it:
Look at: ./454NewblerMetrics

~karplus/pluck/Hp/map_separate_is607/

More recent mapping
Plasmid cleaned up but not the transposon
Transposon typical length 2.5k
About 2,000 of the reads mapped to the transposon
4% read error means there is a 4% difference between the NCBI version of the transposon and what is being mapped to, not necessarily a 4% read error in the sequencing
consensusAccuracy is 93%, usually higher as 98%-99%, may be that the transposon is highly variable
./454ReadStatus.txt
Tells you how each read was handled
Status of reads: Unmapped, Full, Partial, Repeat, Chimeric (across different sequences)
Some mapped partially to the transposon (e.g. at end of transposon and on chromosome)

./454AllContigs.fma

./454AllContigs.qual

~karplus/pluck/Hp/map_separate_is607_2/

Second attempt at mapping
Similar error rate as first attempt
Looks like things are mapping properly
Low error rate as 454 provides cleaner data for Newbler to map
./is607.sff

./nois607.sff

~karplus/pluck/Hp/chrom_assembly24/

Started over removing the plasmids and transposon
Mapping contigs of v24 to v19 assembly
Worried that previous assembly may have bugs in it (e.g. miss a path because previous path from old assembly is still there, may miss another path)
Did it without looking at previous work that Jenny had done, to be confident that had independently arrived at same conclusion
Went from 54 to 51 contigs
Did more through job of cleaning out the plasmid this assembly
One things should notice that is there is a large variation of reads per base

If there are big spikes, suspect that they may be parasites

~karplus/pluck/Hp/chrom_assembly24/

trim0 = have to match all bases in reads
trimX = have to match all-X bases
trim9 = good compromise this time between too many matches in mapping vs. too few
./trim9.out

Look at your data, don't just accept the program output. See if it matches your expectations and investigate anything that looks suspicious.

~karplus/pluck/Hp/chrom_assembly26/

./v26.sticher