Table of Contents

Lecture Notes for May 28, 2010

Pcap Pog Assembly

Location: assemblies/Pog/pcap-assembly/

./contigs_in_ref.blat-strict-match

Looking at the T start and T end (where it is in the chromosome)
Notice there is a problem with contig 8; there is a large hole

./Pog454/fofn.pcap.scaffold.info

Parameters point to overlap consensus
Coverage looks right
A small number of reads of deep coverage, they may infact be bogus reads or pieces of the virus

./contigs_in_ref.psl

Contig 8 looks like it is problematic
After contig 17, we are getting down to 1k pieces, not a clear distinction between ones that are needed
Doesn't look too bad as an assembly, but needs some cleaning up
Not many chimeric reads
The virus is probably somewhere in there as well (search for ece, in contigs 11, 14, etc.)
Curious why it produces overlapping contigs (didn't build any supercontigs, maybe supporting multiple paths)
~2 hr run time, not that different from Newbler

Lecture

Talk about the internals of colorspace mapping (not necessarily the correct way to do it)

Have dicts:
ref: is the list of sequences that are input from FASTA
ref_color: have the same thing, but translated to color space

ref_start: wanted to store locations (a triplet into a single integer to save space, gave each chromosome a number, done in megabases)

Then have a bunch of RDB files, contains the names of files, the files that are open
cross_rdb: The ones that didn't map well

print_forward_backward()
Used to check if there is a bias of freads and rreads mapping to the forward or backward strand

A big chunk of the code is to check for inversions
print_possible_transversions)()
Where are there places that there is homologous recombination in reverse places?

check Builds the possible inversion lists
Uses as hash table of k-mers

windows: a hash table given a color space k-mer, it doesn't take it as a string but as an int to save space, returns a list of locations where that k-mer exists
short_windows: does the same thing but with k-mers of a shorter length

lookup_read_once()
The simplest test to return a list of where the read maps
If it is in the windows hash table, return the list
lookup_read_twice()
The fixup here for the trimming is different than before
The hack where get the width + trim lookup and then lookup trim first + width
lookup_read_many()
This one is the agressive mapping
Taking the union of all the places that it could map of at least 2

narrow_locs()
take a list of locations and narrow it down to exact matches

Next Week

Clean up the files on campusrocks, finish README files, add anything to wiki
A week from today freeze files on campusrocks and do backup