This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lecture_notes:05-15-2015 [2015/05/18 15:02] gepoliano |
lecture_notes:05-15-2015 [2015/05/18 15:11] (current) gepoliano |
||
---|---|---|---|
Line 1: | Line 1: | ||
=====SGA update===== | =====SGA update===== | ||
- | * SGA is a memory efficient assembler | + | * SGA is a memory efficient assembler |
- | *It was possible to compute more compressed data | + | * It was possible to compute more compressed data |
- | *The pipeline changed, since it was not easy to figure out how to run it | + | * The pipeline changed, since it was not easy to figure out how to run it |
- | *It was necessary to make sure the parameters are running | + | * It was necessary to make sure the parameters are running |
- | -The group assembled one dataset, merged together | + | * The group assembled one dataset, merged together |
- | -SGA indexed each dataset separated | + | * SGA indexed each dataset separated |
- | -Merging is complicated in a pairwise fashion, then two pairs were merged at a time | + | * Merging is complicated in a pairwise fashion, then two pairs were merged at a time |
- | -Indexing all three sistinct submissions | + | * Indexing all three sistinct submissions |
- | -Pre-processed adapter trimming | + | * Pre-processed adapter trimming |
- | -Duplicate-removal is later than indexing | + | * Duplicate-removal is later than indexing |
- | -One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed | + | * One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed |
+ | * The overall duplication level is a problem | ||
+ | * Each datset was generated independently | ||
+ | * Then, removing duplicates should be done apart for each dataset | ||
+ | * The dataset is very complicated - there is big duplication rate across the dataset the group has | ||
+ | * Merging indexes - planning on pulling some stats from the grin engine to pull information | ||
+ | * The wall time is large | ||
+ | * A variant file with the bubble pop counted the contigs | ||
+ | * The group is planning on using the mate-pair data | ||
+ | * Do adapter removal and index removal - using skewer | ||
+ | * |