======SW019_S1======

===== Sequencing data =====
| Library | Run | Location | Notes | 
| SW019_S1 | MiSeq 150311_M00160_0068_000000000-AAHYE | /campusdata/BME235/Spring2015Data/ | 2x300 |

===== Files =====
| File | Size | Reads| 
| SW019_S1_L001_R1_001.fastq | 17G | 27,268,723 |
| SW019_S1_L001_R2_001.fastq | 17G | 27,268,723 |
| bams/SW019_S1_L001_001.bam | 12G | 
| adapterAndPCRFreeFiles/SW019_adapterTrimmed_dupRemoved_150424_R1.fastq |40 G|
| adapterAndPCRFreeFiles/SW019_adapterTrimmed_dupRemoved_150424_R2.fastq |40 G|
| ErrorCorrected/SW019_seqprep_dupRemoved_ec_R1.fastq | 37G | 124,898,033 |
| ErrorCorrected/SW019_seqprep_dupRemoved_ec_R2.fastq | 37G | 124,898,033 |

===== Fastqc results =====

Wed Apr 8 

Forward reads {{:sw019_s1_l001_r1_001.fastq_fastqc_report.pdf|}}

Reverse reads {{:sw019_s1_l001_r2_001.fastq_fastqc_report.pdf|}}

Fastqc detected two issues with the reads: a) the per base sequence quality of reads decreases at the end of reads, and b) there are several over-represented k-mers in the data (possibly adaptor sequences). 

===== Preqc (SGA preprocessing) results =====

Fri Apr 10

{{:MiSeq_preqc_report.pdf|}}


==== Comments ====
Chris Eisenhart - Looking at the Preqc results, I do not see the banana slug data in the first three slides. The banana slug is indicated to be blue by the index, but I do not see it on these graphs. Could this due to a hard coded X and Y ranges that do not include the banana slug data? 

The preqc output is slightly strange, likely because of the low coverage. Preqc estimated a genome size of ~1600 Mbp.

===== Skewer adapter removal ====== 

Fri Apr 17

These data with the adapters removed are located at  

/campusdata/BME235/S15_assemblies/SOAPdenovo2/adapterRemovalTask/skewer_run/SW019_S1_L001_better/SW019_S1_L001-trimmed-pair1.fastq

/campusdata/BME235/S15_assemblies/SOAPdenovo2/adapterRemovalTask/skewer_run/SW019_S1_L001_better/SW019_S1_L001-trimmed-pair2.fastq

===== Fastq to bam ===== 
The fastq to bam conversion was performed using the picard toolset.  Specifically the fastqToSam.jar file was used to prepare the bam files. 

[[contributors:team_5:fastqtosamcommands| FastqToSam commands]]

===== Raw fastq adapter presence analysis =====

This section contains various notes I've made when doing a second pass in analyzing the presence of potential adapter sequences in the raw .fastq datasets.

For forward (R1) strands:
     - No overrepresented sequences were detected by fastqc on the strands of SW019_S1_L001_R1_001.fastq.
     - For kmer content, all biased kmers followed certain sequence patters when spliced together as they are ordered in the fastqc plot:
          - AGATCGGAAGAGC: Resembles multiple adapter sequences of Oligo IDs IS3_adapter.P5+P7, beginning of  BO2.P5.R or beginning of BO3.P7.part1.F.
          - TCTTCCGATCT: Resembles multiple adapter sequences of Oligo IDs at the end of IS1_adapter.P5, the end of IS2_adapter.P7, the end of BO1.P5.F or at the end of BO4.P7.part1.R.

For reverse (R2) strands:
     - No other overrepresented sequences were detected by fastqc on the strands of SW019_S1_L001_R2_001.fastq.
     - For kmer content, all biased kmers followed certain sequence patters when spliced together as they are ordered in the fastqc plot:
          - AGATCGGAAGAGCGT: Resembles an adapter sequence of Oligo ID at the start of BO2.P5.R.
          - CTTCCGATCT: Resembles multiple adapter sequences of Oligo IDs at the end of IS1_adapter.P5, the end of IS2_adapter.P7, the end of BO1.P5.F or at the end of BO4.P7.part1.R.
          - ATCGGAAG: Resembles part of IS3_adapter.P5+P7 or part of BO2.P5.R or part of BO3.P7.part1.F

=====SeqPrep results=====
The data files were trimmed using SeqPrep, both with and without merging. The output for the run without merging is in /campusdata/BME235/Spring2015Data/adapter_trimming/SeqPrep and the output for the run with merging is in /campusdata/BME235/Spring2015Data/merging/SeqPrep. The trimmed R1 and R2 files for the run with merging are smaller than those from the non-merging run, but the difference is much smaller than with SW018_1 and the merged file is much larger. 

The adapters used for both runs were AGATCGGAAGAGCACACGTCTGAACTCCAG (-A option) and AGATCGGAAGAGCGTCGTGTAGGGAAAGAG (-B option). 

===== Merged SW019 Libraries =====
All SW019 data sets that had been adapter trimmed using Seqprep were merged with Fastuniq to remove duplicates and then error corrected using Musket