=====SGA update===== * SGA is a memory efficient assembler * It was possible to compute more compressed data * The pipeline changed, since it was not easy to figure out how to run it * It was necessary to make sure the parameters are running * The group assembled one dataset, merged together * SGA indexed each dataset separated * Merging is complicated in a pairwise fashion, then two pairs were merged at a time * Indexing all three sistinct submissions * Pre-processed adapter trimming * Duplicate-removal is later than indexing * One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed * The overall duplication level is a problem * Each datset was generated independently * Then, removing duplicates should be done apart for each dataset * The dataset is very complicated - there is big duplication rate across the dataset the group has * Merging indexes - planning on pulling some stats from the grin engine to pull information * The wall time is large * A variant file with the bubble pop counted the contigs * The group is planning on using the mate-pair data * Do adapter removal and index removal - using skewer *