Differences

This shows you the differences between two versions of the page.

--- post-assembly_analysis:2015:rna_scaffolding [2015/08/31 20:00]
ceisenhart
+++ post-assembly_analysis:2015:rna_scaffolding [2015/08/31 21:10]
ceisenhart
@@ Line 1: / Line 1: @@
 =====RNA scaffolding=====
-The RNA Scaffolding is currently in process.  Please contact Chris Eisenhart (ceisenhart@soe.ucsc.edu) with questions.
+The RNA Scaffolding is currently in process.  Please contact [[ceisenhart@soe.ucsc.edu | Chris Eisenhart]] with questions.
 The current pipeline can be broken down into three major steps;
 **data processing**, **transcriptome assembly**, and **genome scaffolding**
+The corresponding data files are documented online [[https://banana-slug.soe.ucsc.edu/data_overview:2015:rna-seq | here]].
 ====Data processing====
@@ Line 20: / Line 21: @@
 ====Current progress====
-I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage.   Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly).  I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed.  These partial runs has been using 130+ Gigs of RAM at it's peak, which means that without optimization the full run will crash even our Terrabyte RAM machines.
+I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage.   Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly).  I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed.  These partial runs has been using 130+ Gigs of RAM at their peak, which means that without optimization the full run will crash even our Terrabyte RAM machines.
 Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]].  Daren has been assisting me by writing a program for fastq de duplication, and a program for generating fasta statistics.

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools