Table of Contents

BS-MK

Sequencing data

Library Run Location Notes
/campusdata/BME235/Spring2015Data/UCSF_BS-MK/

Files

File Size
BS-MK_CATCCGG_R1.fastq 83G
BS-MK_CATCCGG_R2.fastq 83G
../preqc/ucsf_bs-mk/ucsf_bs-mk.fastq 172G
../adapter_trimming/UCSF_reads_skewer_trimmed/BS_MK_noAdap_R1.fastq 83G
../adapter_trimming/UCSF_reads_skewer_trimmed/BS_MK_noAdap_R2.fastq 83G
../merging/SeqPrep_newData/UCSF_BS-MK_CATCCGG_merged.fastq.gz 11G
../merging/SeqPrep_newData/UCSF_BS-MK_CATCCGG_R1_trimmed.fastq.gz 22G
../merging/SeqPrep_newData/UCSF_BS-MK_CATCCGG_R2_trimmed.fastq.gz 22G

/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R1.fastq

/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R2.fastq

A grep search shows that within the raw data files only 0.06% of reads contain the full adapter sequence.

FastQC results

Results of the run are located here on the campusrocks2 server:

/campusdata/BME235/S15_assemblies/SOAPdenovo2/Fastqc/UCSF_BS-MK_fastqc

fastqc_bs-mk_catccgg_r1.pdf

fastqc_bs-mk_catccgg_r2.pdf

Skewer Adapter trimmed fastqc results

Results of fastqc analysis on the adapter trimmed (using skewer) and PCR duplicate removed (using fastuniq) files:

forward adapter used: ACACTCTTTCCCTACACGACGCTCTTCCGATCT

reverse adapter used: GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

fastqc_bs_mk_noadap_r1.pdf

fastqc_bs_mk_noadap_r2.pdf

The “no-adap” sequences look like they still have a high-frequency kmer at the beginning: GCTCTTCCGATCTA, which looks like the claimed -B option AGATCGGAAGAGCGTCGTGTAGGGAAAGAG, which complements to CTCTTTCCCTACACGACGCTCTTCCGATCT

Was adapter trimming done correctly? Why did Skewer not remove the adapter sequence?

Skewer was not run properly. It was redone with the adapters: Forward: AGATCGGAAGAGCACACGTCTGAACTCCAG Reverse: AGATCGGAAGAGCGTCGTGTAGGGAAAGAG

Here are the new fastqc results:

fastqc report for v2 BS-MK R1 trimmed reads

fastqc report for v2 BS-MK R2 trimmed reads

There is still a failing k-mer content.

Seqprep Adapter trimmed fastqc results

Results of fastqc analysis on the seqprep adapter trimmed files:

fastqc_ucsf_bs-mk_catccgg_r1_tr....fastq.gz_fastqc_report.pdf

fastqc_ucsf_bs-mk_catccgg_r2_tr....fastq.gz_fastqc_report.pdf

These fastqc analyses show a huge amount of adapter at the beginnings of the reads. Was SeqPrep told about the adapters? What parameters was it run with?

Preqc (SGA preprocessing) results

Tues May 26

ucsf_bs-mk_preqc_report.pdf

Preqc report of the UCSF BS-MK and BS-tag data (pooled).

ucsf_new_lib_preqc_report.pdf

Comments

The preqc report for the ucsf_bs-mk reads look similar to previous reports. This report provides a useful baseline for comparison with other pre-processing efforts.

SeqPrep results

The data files were trimmed using SeqPrep, both with and without merging. The output for the run without merging is in /campusdata/BME235/Spring2015Data/adapter_trimming/SeqPrep_newData and the output for the run with merging is in /campusdata/BME235/Spring2015Data/merging/SeqPrep_newData. The trimmed R1 and R2 files for the run with merging are somewhat smaller than those from the non-merging run.

The adapters used for both runs were AGATCGGAAGAGCACACGTCTGAACTCCAG (-A option) and AGATCGGAAGAGCGTCGTGTAGGGAAAGAG (-B option).

FastQC on Seqprep, Fastuniq, Musket files

Seqprep adapter removed files were run through Fastuniq to remove PCR duplicates, then through musket for error correction, and lastly FastQC for analysis

bs-mk_seqprep_dupremoved_ec_r1.pdf

bs-mk_seqprep_dupremoved_ec_r2.pdf

Files are located here:

/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R1.fastq

/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R2.fastq