User Tools

Site Tools


data_overview:2015:miseq_data

SW019_S1

Sequencing data

Library Run Location Notes
SW019_S1 MiSeq 150311_M00160_0068_000000000-AAHYE /campusdata/BME235/Spring2015Data/ 2×300

Files

File Size Reads
SW019_S1_L001_R1_001.fastq 17G 27,268,723
SW019_S1_L001_R2_001.fastq 17G 27,268,723
bams/SW019_S1_L001_001.bam 12G
adapterAndPCRFreeFiles/SW019_adapterTrimmed_dupRemoved_150424_R1.fastq 40 G
adapterAndPCRFreeFiles/SW019_adapterTrimmed_dupRemoved_150424_R2.fastq 40 G
ErrorCorrected/SW019_seqprep_dupRemoved_ec_R1.fastq 37G 124,898,033
ErrorCorrected/SW019_seqprep_dupRemoved_ec_R2.fastq 37G 124,898,033

Fastqc results

Wed Apr 8

Forward reads sw019_s1_l001_r1_001.fastq_fastqc_report.pdf

Reverse reads sw019_s1_l001_r2_001.fastq_fastqc_report.pdf

Fastqc detected two issues with the reads: a) the per base sequence quality of reads decreases at the end of reads, and b) there are several over-represented k-mers in the data (possibly adaptor sequences).

Preqc (SGA preprocessing) results

Comments

Chris Eisenhart - Looking at the Preqc results, I do not see the banana slug data in the first three slides. The banana slug is indicated to be blue by the index, but I do not see it on these graphs. Could this due to a hard coded X and Y ranges that do not include the banana slug data?

The preqc output is slightly strange, likely because of the low coverage. Preqc estimated a genome size of ~1600 Mbp.

Skewer adapter removal

Fri Apr 17

These data with the adapters removed are located at

/campusdata/BME235/S15_assemblies/SOAPdenovo2/adapterRemovalTask/skewer_run/SW019_S1_L001_better/SW019_S1_L001-trimmed-pair1.fastq

/campusdata/BME235/S15_assemblies/SOAPdenovo2/adapterRemovalTask/skewer_run/SW019_S1_L001_better/SW019_S1_L001-trimmed-pair2.fastq

Fastq to bam

The fastq to bam conversion was performed using the picard toolset. Specifically the fastqToSam.jar file was used to prepare the bam files.

FastqToSam commands

Raw fastq adapter presence analysis

This section contains various notes I've made when doing a second pass in analyzing the presence of potential adapter sequences in the raw .fastq datasets.

For forward (R1) strands:

  1. No overrepresented sequences were detected by fastqc on the strands of SW019_S1_L001_R1_001.fastq.
  2. For kmer content, all biased kmers followed certain sequence patters when spliced together as they are ordered in the fastqc plot:
    1. AGATCGGAAGAGC: Resembles multiple adapter sequences of Oligo IDs IS3_adapter.P5+P7, beginning of BO2.P5.R or beginning of BO3.P7.part1.F.
    2. TCTTCCGATCT: Resembles multiple adapter sequences of Oligo IDs at the end of IS1_adapter.P5, the end of IS2_adapter.P7, the end of BO1.P5.F or at the end of BO4.P7.part1.R.

For reverse (R2) strands:

  1. No other overrepresented sequences were detected by fastqc on the strands of SW019_S1_L001_R2_001.fastq.
  2. For kmer content, all biased kmers followed certain sequence patters when spliced together as they are ordered in the fastqc plot:
    1. AGATCGGAAGAGCGT: Resembles an adapter sequence of Oligo ID at the start of BO2.P5.R.
    2. CTTCCGATCT: Resembles multiple adapter sequences of Oligo IDs at the end of IS1_adapter.P5, the end of IS2_adapter.P7, the end of BO1.P5.F or at the end of BO4.P7.part1.R.
    3. ATCGGAAG: Resembles part of IS3_adapter.P5+P7 or part of BO2.P5.R or part of BO3.P7.part1.F

SeqPrep results

The data files were trimmed using SeqPrep, both with and without merging. The output for the run without merging is in /campusdata/BME235/Spring2015Data/adapter_trimming/SeqPrep and the output for the run with merging is in /campusdata/BME235/Spring2015Data/merging/SeqPrep. The trimmed R1 and R2 files for the run with merging are smaller than those from the non-merging run, but the difference is much smaller than with SW018_1 and the merged file is much larger.

The adapters used for both runs were AGATCGGAAGAGCACACGTCTGAACTCCAG (-A option) and AGATCGGAAGAGCGTCGTGTAGGGAAAGAG (-B option).

Merged SW019 Libraries

All SW019 data sets that had been adapter trimmed using Seqprep were merged with Fastuniq to remove duplicates and then error corrected using Musket

You could leave a comment if you were logged in.
data_overview/2015/miseq_data.txt · Last modified: 2015/09/11 18:39 by 5.9.83.211