======SW019_S1====== ===== Sequencing data ===== | Library | Run | Location | Notes | | SW019_S1 | MiSeq 150311_M00160_0068_000000000-AAHYE | /campusdata/BME235/Spring2015Data/ | 2x300 | ===== Files ===== | File | Size | Reads| | SW019_S1_L001_R1_001.fastq | 17G | 27,268,723 | | SW019_S1_L001_R2_001.fastq | 17G | 27,268,723 | | bams/SW019_S1_L001_001.bam | 12G | | adapterAndPCRFreeFiles/SW019_adapterTrimmed_dupRemoved_150424_R1.fastq |40 G| | adapterAndPCRFreeFiles/SW019_adapterTrimmed_dupRemoved_150424_R2.fastq |40 G| | ErrorCorrected/SW019_seqprep_dupRemoved_ec_R1.fastq | 37G | 124,898,033 | | ErrorCorrected/SW019_seqprep_dupRemoved_ec_R2.fastq | 37G | 124,898,033 | ===== Fastqc results ===== Wed Apr 8 Forward reads {{:sw019_s1_l001_r1_001.fastq_fastqc_report.pdf|}} Reverse reads {{:sw019_s1_l001_r2_001.fastq_fastqc_report.pdf|}} Fastqc detected two issues with the reads: a) the per base sequence quality of reads decreases at the end of reads, and b) there are several over-represented k-mers in the data (possibly adaptor sequences). ===== Preqc (SGA preprocessing) results ===== Fri Apr 10 {{:MiSeq_preqc_report.pdf|}} ==== Comments ==== Chris Eisenhart - Looking at the Preqc results, I do not see the banana slug data in the first three slides. The banana slug is indicated to be blue by the index, but I do not see it on these graphs. Could this due to a hard coded X and Y ranges that do not include the banana slug data? The preqc output is slightly strange, likely because of the low coverage. Preqc estimated a genome size of ~1600 Mbp. ===== Skewer adapter removal ====== Fri Apr 17 These data with the adapters removed are located at /campusdata/BME235/S15_assemblies/SOAPdenovo2/adapterRemovalTask/skewer_run/SW019_S1_L001_better/SW019_S1_L001-trimmed-pair1.fastq /campusdata/BME235/S15_assemblies/SOAPdenovo2/adapterRemovalTask/skewer_run/SW019_S1_L001_better/SW019_S1_L001-trimmed-pair2.fastq ===== Fastq to bam ===== The fastq to bam conversion was performed using the picard toolset. Specifically the fastqToSam.jar file was used to prepare the bam files. [[contributors:team_5:fastqtosamcommands| FastqToSam commands]] ===== Raw fastq adapter presence analysis ===== This section contains various notes I've made when doing a second pass in analyzing the presence of potential adapter sequences in the raw .fastq datasets. For forward (R1) strands: - No overrepresented sequences were detected by fastqc on the strands of SW019_S1_L001_R1_001.fastq. - For kmer content, all biased kmers followed certain sequence patters when spliced together as they are ordered in the fastqc plot: - AGATCGGAAGAGC: Resembles multiple adapter sequences of Oligo IDs IS3_adapter.P5+P7, beginning of BO2.P5.R or beginning of BO3.P7.part1.F. - TCTTCCGATCT: Resembles multiple adapter sequences of Oligo IDs at the end of IS1_adapter.P5, the end of IS2_adapter.P7, the end of BO1.P5.F or at the end of BO4.P7.part1.R. For reverse (R2) strands: - No other overrepresented sequences were detected by fastqc on the strands of SW019_S1_L001_R2_001.fastq. - For kmer content, all biased kmers followed certain sequence patters when spliced together as they are ordered in the fastqc plot: - AGATCGGAAGAGCGT: Resembles an adapter sequence of Oligo ID at the start of BO2.P5.R. - CTTCCGATCT: Resembles multiple adapter sequences of Oligo IDs at the end of IS1_adapter.P5, the end of IS2_adapter.P7, the end of BO1.P5.F or at the end of BO4.P7.part1.R. - ATCGGAAG: Resembles part of IS3_adapter.P5+P7 or part of BO2.P5.R or part of BO3.P7.part1.F =====SeqPrep results===== The data files were trimmed using SeqPrep, both with and without merging. The output for the run without merging is in /campusdata/BME235/Spring2015Data/adapter_trimming/SeqPrep and the output for the run with merging is in /campusdata/BME235/Spring2015Data/merging/SeqPrep. The trimmed R1 and R2 files for the run with merging are smaller than those from the non-merging run, but the difference is much smaller than with SW018_1 and the merged file is much larger. The adapters used for both runs were AGATCGGAAGAGCACACGTCTGAACTCCAG (-A option) and AGATCGGAAGAGCGTCGTGTAGGGAAAGAG (-B option). ===== Merged SW019 Libraries ===== All SW019 data sets that had been adapter trimmed using Seqprep were merged with Fastuniq to remove duplicates and then error corrected using Musket