lecture_notes:05-15-2015

SGA update

SGA is a memory efficient assembler
It was possible to compute more compressed data
The pipeline changed, since it was not easy to figure out how to run it
It was necessary to make sure the parameters are running
The group assembled one dataset, merged together
SGA indexed each dataset separated
Merging is complicated in a pairwise fashion, then two pairs were merged at a time
Indexing all three sistinct submissions
Pre-processed adapter trimming
Duplicate-removal is later than indexing
One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed
The overall duplication level is a problem
Each datset was generated independently
Then, removing duplicates should be done apart for each dataset
The dataset is very complicated - there is big duplication rate across the dataset the group has
Merging indexes - planning on pulling some stats from the grin engine to pull information
The wall time is large
A variant file with the bubble pop counted the contigs
The group is planning on using the mate-pair data
Do adapter removal and index removal - using skewer