This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
post-assembly_analysis:2015:rna_scaffolding [2015/08/31 20:00] ceisenhart |
post-assembly_analysis:2015:rna_scaffolding [2015/08/31 21:10] ceisenhart |
||
---|---|---|---|
Line 1: | Line 1: | ||
=====RNA scaffolding===== | =====RNA scaffolding===== | ||
- | The RNA Scaffolding is currently in process. Please contact Chris Eisenhart (ceisenhart@soe.ucsc.edu) with questions. | + | The RNA Scaffolding is currently in process. Please contact [[ceisenhart@soe.ucsc.edu | Chris Eisenhart]] with questions. |
The current pipeline can be broken down into three major steps; | The current pipeline can be broken down into three major steps; | ||
**data processing**, **transcriptome assembly**, and **genome scaffolding** | **data processing**, **transcriptome assembly**, and **genome scaffolding** | ||
+ | The corresponding data is documented online [[https://banana-slug.soe.ucsc.edu/data_overview:2015:rna-seq | here]]. | ||
====Data processing==== | ====Data processing==== | ||
Line 20: | Line 21: | ||
====Current progress==== | ====Current progress==== | ||
- | I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage. Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly). I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed. These partial runs has been using 130+ Gigs of RAM at it's peak, which means that without optimization the full run will crash even our Terrabyte RAM machines. | + | I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage. Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly). I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed. These partial runs has been using 130+ Gigs of RAM at their peak, which means that without optimization the full run will crash even our Terrabyte RAM machines. |
Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]]. Daren has been assisting me by writing a program for fastq de duplication, and a program for generating fasta statistics. | Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]]. Daren has been assisting me by writing a program for fastq de duplication, and a program for generating fasta statistics. |