User Tools

Site Tools


contributors:team_3:merging_indexes

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

[code] Distributed construction of an FM index from multiple input files jts edited this page on May 26, 2011 · 1 revision Pages 11 Home ASQG Format Distributed construction of an FM index from multiple input files Example assembly workflow FAQ Indexing large data sets Parameter tuning Preqc Scaffolding multiple libraries SGA Design sga subprograms Clone this wiki locally Clone in Desktop If your data sets consists of multiple files, you can construct the FM-index for each file separately then merge the indices together to obtain an index of the entire data. This requires much less memory than constructing an index from a single file containing the entire data set. For example, suppose your data consists of four files: s_1_1.fastq s_1_2.fastq s_2_1.fastq s_2_2.fastq We begin by constructing an index of each file individually: sga index s_1_1.fastq sga index s_1_2.fastq sga index s_2_1.fastq sga index s_2_2.fastq Then we want to merge the indices together in pairs until we obtain a single index: sga merge -p merged1 s_1_1.fastq s_1_2.fastq sga merge -p merged2 s_2_1.fastq s_2_2.fastq sga merge -p final merged1.fa merged2.fa The final index can then be used in other steps of the pipeline, for instance to error correct the original sequence files: sga correct -p final s_1_1.fastq sga correct -p final s_1_2.fastq sga correct -p final s_2_1.fastq sga correct -p final s_2_2.fastq [/code]

You could leave a comment if you were logged in.
contributors/team_3/merging_indexes.1429558529.txt.gz · Last modified: 2015/04/20 19:35 by chkan