SGA update

  • SGA is a memory efficient assembler
  • It was possible to compute more compressed data
  • The pipeline changed, since it was not easy to figure out how to run it
  • It was necessary to make sure the parameters are running
  • The group assembled one dataset, merged together
  • SGA indexed each dataset separated
  • Merging is complicated in a pairwise fashion, then two pairs were merged at a time
  • Indexing all three sistinct submissions
  • Pre-processed adapter trimming
  • Duplicate-removal is later than indexing
  • One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed
  • The overall duplication level is a problem
  • Each datset was generated independently
  • Then, removing duplicates should be done apart for each dataset
  • The dataset is very complicated - there is big duplication rate across the dataset the group has
  • Merging indexes - planning on pulling some stats from the grin engine to pull information
  • The wall time is large
  • A variant file with the bubble pop counted the contigs
  • The group is planning on using the mate-pair data
  • Do adapter removal and index removal - using skewer
