For both of these files the Kmer content looks a bit concerning
Sat May 23
The preqc report for the ucsf_sw018 reads look similar to previous reports. However, the genome size estimate is low (1.9 compared 2.2 Gb). The ucsf_sw018 is not represented in the k-de Bruijn graphs indicating insufficient coverage to make these predictions. This report provides a useful baseline for comparison with other pre-processing efforts.
The raw fastq files were put through a pre processing pipeline. First the fastq files had adaptor sequences removed using Skewer. The adaptor free files were further processed with FastUniq to remove PCR duplicates.
Running the merged and trimmed files
predicted best k: 61
The data files were trimmed using SeqPrep, both with and without merging. The output for the run without merging is in /campusdata/BME235/Spring2015Data/adapter_trimming/SeqPrep_newData and the output for the run with merging is in /campusdata/BME235/Spring2015Data/merging/SeqPrep_newData. The trimmed R1 and R2 files for the run with merging are significantly smaller than those from the non-merging run.
The adapters used for both runs were AGATCGGAAGAGCACACGTCTGAACTCCAG (-A option) and AGATCGGAAGAGCGTCGTGTAGGGAAAGAG (-B option).
All SW018 data sets that had been adapter trimmed using Seqprep were merged with Fastuniq to remove duplicates and then error corrected using Musket