Sequencing data

Library Run Location Notes
SW042 /campusdata/BME235/Spring2015Data/ Mate pair library. Expected insert size is 5-6kb.


File Size Reads
SW042.r1.trimmed.fastq 152M 1,036,699
SW042.r2.trimmed.fastq 153M 1,086,654
Matepair_trimmed/skewer_run2_SW042_1_trimmed-pair1.fastq 145M 735,906
Matepair_trimmed/skewer_run2_SW042_1-trimmed-pair2.fastq 140M 735,906
Matepair_dupRemoved/skewer_42_dupRemoved_R1.fastq 129M 657,883
Matepair_dupRemoved/skewer_42_dupRemoved_R2.fastq 125M 657,883

Note: Duplicates, concatemers, and linkers have already been removed in the “trimmed” files.

FastQC analysis

There are several summary statistics that fastqc flags as potentially unusual such as the per base sequence content and kmer content.

Fastqc results for SW042.r1.trimmed.fastq

Fastqc results for SW042.r2.trimmed.fastq

PreQC analysis

Run on SW042.r1.trimmed.fastq and SW042.r2.trimmed.fastq

Preqc for SW042

Insert size distribution

The distribution of insert sizes for inward facing, outward facing, and same strand reads is shown below. Mate pairs should be outward facing.

To generate this distribution, mates pairs were mapped to all the soapdenovo “run 1” contigs using bwa. The orientation of reads was pulled from the resulting sam file using a script from the Green lab.


Adapter Removal with Skewer

Running skewer with a junction sequence of


skewer-0.1.123-linux-x86_64 -x CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -y CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -m mp -j CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG -t 32 -o ${OUTDIR} /campusdata/BME235/Spring2015Data/SW042.r1.trimmed.fastq /campusdata/BME235/Spring2015Data/SW042.r2.trimmed.fastq

Using the adapter sequences obtained from this paper Nextera Mate Pair Kit

Files located here:



Fastqc results



Fastuniq to remove duplicates

Fastuniq was run to remove any duplicates still remaining after adapter removal

Files located here:



Fastqc results



