Library | Run | Location | Notes |
Lucigen NxSeq Long Mate Pair Library Kit | /campusdata/BME235/Spring2015Data/ | Supposed to be 2x300bp reads with long (>1000bp) insert size |
File | Size | Reads |
R1_IJS8_mates_ICC5_SW023_S60_L001_R1_001.fastq | 43M | 217,484 |
R2_IJS8_mates_ICC5_SW023_S60_L001_R2_001.fastq | 43M | 217,484 |
/campusdata/BME235/Spring2015Data/Matepair_dupRemoved/lucigen_mp_dupRemoved_R1.fastq | 20M | 89,856 |
/campusdata/BME235/Spring2015Data/Matepair_dupRemoved/lucigen_mp_dupRemoved_R2.fastq | 21M | 89,856 |
These data were generated from the Lucigen NxSeq Long Mate Pair Library Kit. Reads were processed as described on page 57 and 58 of the NxSeq manual.
Lucigen Mate Pair Post Sequencing Filter Steps
Fastqc indicates that there are multiple technical problems with the reads, beyond the usual decrease in quality scores at the ends of reads. For example, most of these reads are only about 30bp long, when they are supposed to be 300bp long.
There are also unusual sequence duplication levels and abnormal k-mer content at the ends of reads.
The distribution of insert sizes for inward facing, outward facing, and same strand reads is shown below. Mate pairs should be outward facing.
To generate this distribution, mates pairs were mapped to all the soapdenovo “run 1” contigs using bwa. The orientation of reads was pulled from the resulting sam file using a script from the Green lab.
Using Fastuniq to remove duplicates decreased the number of reads significantly ~217,000 to ~90,000
Fastqc