====== Lecture Notes for May 28, 2010 ====== ===== Pcap Pog Assembly ===== Location: assemblies/Pog/pcap-assembly/ \\ ==== ./contigs_in_ref.blat-strict-match ==== Looking at the T start and T end (where it is in the chromosome) \\ Notice there is a problem with contig 8; there is a large hole \\ ==== ./Pog454/fofn.pcap.scaffold.info ==== Parameters point to overlap consensus \\ Coverage looks right \\ A small number of reads of deep coverage, they may infact be bogus reads or pieces of the virus \\ ==== ./contigs_in_ref.psl ==== Contig 8 looks like it is problematic \\ After contig 17, we are getting down to 1k pieces, not a clear distinction between ones that are needed \\ Doesn't look too bad as an assembly, but needs some cleaning up \\ Not many chimeric reads \\ The virus is probably somewhere in there as well (search for ece, in contigs 11, 14, etc.) \\ Curious why it produces overlapping contigs (didn't build any supercontigs, maybe supporting multiple paths) \\ ~2 hr run time, not that different from Newbler \\ ===== Lecture ===== Talk about the internals of colorspace mapping (not necessarily the correct way to do it) \\ \\ Have dicts: \\ ref: is the list of sequences that are input from FASTA \\ ref_color: have the same thing, but translated to color space \\ * Only two bits of information lost \\ * Corresponds to the base space as the transition to get to the base \\ * Also have the length of the reference sequence \\ ref_start: wanted to store locations (a triplet into a single integer to save space, gave each chromosome a number, done in megabases) \\ \\ Then have a bunch of RDB files, contains the names of files, the files that are open \\ cross_rdb: The ones that didn't map well \\ \\ print_forward_backward() \\ Used to check if there is a bias of freads and rreads mapping to the forward or backward strand \\ \\ A big chunk of the code is to check for inversions \\ print_possible_transversions)() \\ Where are there places that there is homologous recombination in reverse places? \\ \\ check Builds the possible inversion lists \\ Uses as hash table of k-mers \\ \\ windows: a hash table given a color space k-mer, it doesn't take it as a string but as an int to save space, returns a list of locations where that k-mer exists \\ short_windows: does the same thing but with k-mers of a shorter length \\ \\ lookup_read_once() \\ The simplest test to return a list of where the read maps \\ If it is in the windows hash table, return the list \\ lookup_read_twice() \\ The fixup here for the trimming is different than before \\ The hack where get the width + trim lookup and then lookup trim first + width \\ lookup_read_many() \\ This one is the agressive mapping \\ Taking the union of all the places that it could map of at least 2 \\ \\ narrow_locs() \\ take a list of locations and narrow it down to exact matches \\ ===== Next Week ===== Clean up the files on campusrocks, finish README files, add anything to wiki \\ A week from today freeze files on campusrocks and do backup \\