This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
post-assembly_analysis:2015:rna_scaffolding [2015/08/31 20:00] ceisenhart |
post-assembly_analysis:2015:rna_scaffolding [2015/08/31 21:11] (current) ceisenhart |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| =====RNA scaffolding===== | =====RNA scaffolding===== | ||
| - | The RNA Scaffolding is currently in process. Please contact Chris Eisenhart (ceisenhart@soe.ucsc.edu) with questions. | + | The RNA Scaffolding is currently in process. Please contact [[ceisenhart@soe.ucsc.edu | Chris Eisenhart]] with questions. |
| The current pipeline can be broken down into three major steps; | The current pipeline can be broken down into three major steps; | ||
| **data processing**, **transcriptome assembly**, and **genome scaffolding** | **data processing**, **transcriptome assembly**, and **genome scaffolding** | ||
| + | The corresponding data files and wet lab procedures are documented online [[https://banana-slug.soe.ucsc.edu/data_overview:2015:rna-seq | here]]. | ||
| ====Data processing==== | ====Data processing==== | ||
| Line 20: | Line 21: | ||
| ====Current progress==== | ====Current progress==== | ||
| - | I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage. Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly). I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed. These partial runs has been using 130+ Gigs of RAM at it's peak, which means that without optimization the full run will crash even our Terrabyte RAM machines. | + | I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage. Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly). I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed. These partial runs has been using 130+ Gigs of RAM at their peak, which means that without optimization the full run will crash even our Terrabyte RAM machines. |
| Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]]. Daren has been assisting me by writing a program for fastq de duplication, and a program for generating fasta statistics. | Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]]. Daren has been assisting me by writing a program for fastq de duplication, and a program for generating fasta statistics. | ||