The following data is produced by a program called Kmergenie
kmergenie is a program that looks at the multiplicity of kmers of various size in a set of reads. It uses this information to then predict the best Kmer to use for a denovo assembly using the dataset. The below information is generated by this program.
DATA USED The data used in the following sections are modifications of the data sets: MiSeq data SW019_S1_L001, HiSeq data SW018_S1_L007, and HiSeq data SW019_S2_L008. The inputs to kmergenie are these 6 files (note: the undetermined files are not included), however in the first run the adapters have been removed by skewer. In the second run listed below the adapters have been removed by skewer AND musket error correction has been used to correct these adapter-less reads.
This output from kmergenie corresponds to data that has been generated using not the raw data, but the result of running the skewer adapter removal program on the raw data:
Best k : 61mer
Here is a pdf containing a full report of kmerenie for this run. It contains not only the graph above but also the graphs showing multiplicity for kmers of size 21, 31, 41, 51, 61, 71, 81, and 91 (other kmers are not checked). It totals 7 pages: kmergenie_output_adapter_trimming_only.pdf
Seqprep Kmergenie Results
Kmergenie results for adapter trimming using Seqprep on
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW018_S1_L007_R1_001_trimmed.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW018_S1_L007_R2_001_trimmed.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW019_S1_L001_R1_001_trimmed.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW019_S1_L001_R2_001_trimmed.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW019_S2_L008_R1_001_trimmed.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW019_S2_L008_R2_001_trimmed.fastq
has an identical optimal k of 61
Kmergenie was also run on adapter trimmed and merged files
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW018_S1_L007_001_merged.fastq /campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW019_S1_L001_001_merged.fastq /campusdata/BME235/S15_assemblies/SOAPdenovo2/Kmergenie/SW019_S2_L008_001_merged.fastq
These results show an optimal k of 31 on these files
This maybe a topic to discuss, if using merged reads is more promising for assembly
This section contains the Kmergenie output after running musket (Error Correction) on the Skewer data set (the Skewer dataset is the dataset used in the Kmergenie output above):
Best k : 61mer
Here is a pdf containing a full report of kmerenie for this run. It contains not only the graph above but also the graphs showing multiplicity for kmers of size 21, 31, 41, 51, 61, 71, 81, and 91 (other kmers are not checked). It totals 7 pages: ec_merged_data_kmergenie.pdf