Differences

This shows you the differences between two versions of the page.

--- lecture_notes:05-15-2015 [2015/05/18 14:48]
gepoliano
+++ lecture_notes:05-15-2015 [2015/05/18 15:11] (current)
gepoliano
@@ Line 1: / Line 1: @@
 =====SGA update=====
+  * SGA is a memory efficient assembler
+  * It was possible to compute more compressed data
+  * The pipeline changed, since it was not easy to figure out how to run it
+  * It was necessary to make sure the parameters are running
+  * The group assembled one dataset, merged together
+  * SGA indexed each dataset separated
+  * Merging is complicated in a pairwise fashion, then two pairs were merged at a time
+  * Indexing all three sistinct submissions
+  * Pre-processed adapter trimming
+  * Duplicate-removal is later than indexing
+  * One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed
+  * The overall duplication level is a problem
+  * Each datset was generated independently
+  * Then, removing duplicates should be done apart for each dataset
+  * The dataset is very complicated - there is big duplication rate across the dataset the group has
+  * Merging indexes - planning on pulling some stats from the grin engine to pull information
+  * The wall time is large
+  * A variant file with the bubble pop counted the contigs
+  * The group is planning on using the mate-pair data
+  * Do adapter removal and index removal - using skewer
+  *

Banana Slug Genomics