User Tools

Site Tools


lecture_notes:05-15-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lecture_notes:05-15-2015 [2015/05/18 15:01]
gepoliano
lecture_notes:05-15-2015 [2015/05/18 15:11] (current)
gepoliano
Line 1: Line 1:
 =====SGA update===== =====SGA update=====
--SGA is a memory efficient assembler +  * SGA is a memory efficient assembler 
--It was possible to compute more compressed data +  ​* ​It was possible to compute more compressed data 
--The pipeline changed, since it was not easy to figure out how to run it +  ​* ​The pipeline changed, since it was not easy to figure out how to run it 
--It was necessary to make sure the parameters are running  +  ​* ​It was necessary to make sure the parameters are running  
--The group assembled one dataset, merged together +  ​* ​The group assembled one dataset, merged together 
--SGA indexed each dataset separated +  ​* ​SGA indexed each dataset separated 
--Merging is complicated in a pairwise fashion, then two pairs were merged at a time +  ​* ​Merging is complicated in a pairwise fashion, then two pairs were merged at a time 
--Indexing all three sistinct submissions +  ​* ​Indexing all three sistinct submissions 
--Pre-processed adapter trimming  +  ​* ​Pre-processed adapter trimming  
--Duplicate-removal is later than indexing +  ​* ​Duplicate-removal is later than indexing 
--One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed ​+  ​* ​One issue the group found: SW018 and 19, same library are optical PCR duplicates that should be removed 
 +  * The overall duplication level is a problem 
 +  * Each datset was generated independently 
 +  * Then, removing duplicates should be done apart for each dataset 
 +  * The dataset is very complicated - there is big duplication rate across the dataset the group has 
 +  * Merging indexes - planning on pulling some stats from the grin engine to pull information 
 +  * The wall time is large 
 +  * A variant file with the bubble pop counted the contigs  
 +  * The group is planning on using the mate-pair data 
 +  * Do adapter removal and index removal - using skewer 
 +  *  ​
lecture_notes/05-15-2015.1431961305.txt.gz · Last modified: 2015/05/18 15:01 by gepoliano