Library | Run | Location | Notes |
/campusdata/BME235/Spring2015Data/UCSF_BS-MK/ |
File | Size |
BS-MK_CATCCGG_R1.fastq | 83G |
BS-MK_CATCCGG_R2.fastq | 83G |
../preqc/ucsf_bs-mk/ucsf_bs-mk.fastq | 172G |
../adapter_trimming/UCSF_reads_skewer_trimmed/BS_MK_noAdap_R1.fastq | 83G |
../adapter_trimming/UCSF_reads_skewer_trimmed/BS_MK_noAdap_R2.fastq | 83G |
../merging/SeqPrep_newData/UCSF_BS-MK_CATCCGG_merged.fastq.gz | 11G |
../merging/SeqPrep_newData/UCSF_BS-MK_CATCCGG_R1_trimmed.fastq.gz | 22G |
../merging/SeqPrep_newData/UCSF_BS-MK_CATCCGG_R2_trimmed.fastq.gz | 22G |
/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R1.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R2.fastq
A grep search shows that within the raw data files only 0.06% of reads contain the full adapter sequence.
Results of the run are located here on the campusrocks2 server:
/campusdata/BME235/S15_assemblies/SOAPdenovo2/Fastqc/UCSF_BS-MK_fastqc
Skewer Adapter trimmed fastqc results
Results of fastqc analysis on the adapter trimmed (using skewer) and PCR duplicate removed (using fastuniq) files:
forward adapter used: ACACTCTTTCCCTACACGACGCTCTTCCGATCT
reverse adapter used: GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
The “no-adap” sequences look like they still have a high-frequency kmer at the beginning: GCTCTTCCGATCTA, which looks like the claimed -B option AGATCGGAAGAGCGTCGTGTAGGGAAAGAG, which complements to CTCTTTCCCTACACGACGCTCTTCCGATCT
Was adapter trimming done correctly? Why did Skewer not remove the adapter sequence?
Skewer was not run properly. It was redone with the adapters: Forward: AGATCGGAAGAGCACACGTCTGAACTCCAG Reverse: AGATCGGAAGAGCGTCGTGTAGGGAAAGAG
Here are the new fastqc results:
fastqc report for v2 BS-MK R1 trimmed reads
fastqc report for v2 BS-MK R2 trimmed reads
There is still a failing k-mer content.
Seqprep Adapter trimmed fastqc results
Results of fastqc analysis on the seqprep adapter trimmed files:
fastqc_ucsf_bs-mk_catccgg_r1_tr....fastq.gz_fastqc_report.pdf
fastqc_ucsf_bs-mk_catccgg_r2_tr....fastq.gz_fastqc_report.pdf
These fastqc analyses show a huge amount of adapter at the beginnings of the reads. Was SeqPrep told about the adapters? What parameters was it run with?
Tues May 26
Preqc report of the UCSF BS-MK and BS-tag data (pooled).
The preqc report for the ucsf_bs-mk reads look similar to previous reports. This report provides a useful baseline for comparison with other pre-processing efforts.
The data files were trimmed using SeqPrep, both with and without merging. The output for the run without merging is in /campusdata/BME235/Spring2015Data/adapter_trimming/SeqPrep_newData and the output for the run with merging is in /campusdata/BME235/Spring2015Data/merging/SeqPrep_newData. The trimmed R1 and R2 files for the run with merging are somewhat smaller than those from the non-merging run.
The adapters used for both runs were AGATCGGAAGAGCACACGTCTGAACTCCAG (-A option) and AGATCGGAAGAGCGTCGTGTAGGGAAAGAG (-B option).
Seqprep adapter removed files were run through Fastuniq to remove PCR duplicates, then through musket for error correction, and lastly FastQC for analysis
bs-mk_seqprep_dupremoved_ec_r1.pdf
bs-mk_seqprep_dupremoved_ec_r2.pdf
Files are located here:
/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R1.fastq
/campusdata/BME235/S15_assemblies/SOAPdenovo2/errorCorrectionTask/musket_run/pairedEndEC_k31_minmult3/BS-MK_seqprep_dupRemoved_ec_R2.fastq