User Tools

Site Tools


post-assembly_analysis:2015:rna_scaffolding

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
post-assembly_analysis:2015:rna_scaffolding [2015/08/31 20:00]
ceisenhart
post-assembly_analysis:2015:rna_scaffolding [2015/08/31 21:10]
ceisenhart
Line 1: Line 1:
 =====RNA scaffolding===== =====RNA scaffolding=====
-The RNA Scaffolding is currently in process. ​ Please contact ​Chris Eisenhart (ceisenhart@soe.ucsc.eduwith questions.  ​+The RNA Scaffolding is currently in process. ​ Please contact ​[[ceisenhart@soe.ucsc.edu ​| Chris Eisenhart]] ​with questions.  ​
  
 The current pipeline can be broken down into three major steps; ​ The current pipeline can be broken down into three major steps; ​
 **data processing**,​ **transcriptome assembly**, and **genome scaffolding** **data processing**,​ **transcriptome assembly**, and **genome scaffolding**
  
 +The corresponding data files are documented online [[https://​banana-slug.soe.ucsc.edu/​data_overview:​2015:​rna-seq | here]]. ​
 ====Data processing==== ====Data processing====
  
Line 20: Line 21:
  
 ====Current progress==== ====Current progress====
-I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage. ​  ​Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly). ​ I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed. ​ These partial runs has been using 130+ Gigs of RAM at it'​s ​peak, which means that without optimization the full run will crash even our Terrabyte RAM machines. ​+I am working with a small subset of the RNA seq data running it through the pipeline to optimize the options and system usage. ​  ​Currently one full run has been done (completing L_RNA_scaffolder and generating a new fasta assembly file) while seven partial runs have been done (completing the transcriptome assembly). ​ I am still deciding what data processing is needed, I am debating running a RAM expensive de duplication to ensure that all duplicates are removed. ​ These partial runs has been using 130+ Gigs of RAM at their peak, which means that without optimization the full run will crash even our Terrabyte RAM machines. ​
  
 Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]].  Daren has been assisting me by writing a program for fastq de duplication,​ and a program for generating fasta statistics.  ​ Currently I have one undergraduate from UC Berkley working on the pipeline, [[darenliu@berkley.edu | Daren Liu ]].  Daren has been assisting me by writing a program for fastq de duplication,​ and a program for generating fasta statistics.  ​
post-assembly_analysis/2015/rna_scaffolding.txt ยท Last modified: 2015/08/31 21:11 by ceisenhart