User Tools

Site Tools


data_overview:2015:lucigen_mate-pair_data

Sequencing data

Library Run Location Notes
Lucigen NxSeq Long Mate Pair Library Kit /campusdata/BME235/Spring2015Data/ Supposed to be 2x300bp reads with long (>1000bp) insert size

Files

File Size Reads
R1_IJS8_mates_ICC5_SW023_S60_L001_R1_001.fastq 43M 217,484
R2_IJS8_mates_ICC5_SW023_S60_L001_R2_001.fastq 43M 217,484
/campusdata/BME235/Spring2015Data/Matepair_dupRemoved/lucigen_mp_dupRemoved_R1.fastq 20M 89,856
/campusdata/BME235/Spring2015Data/Matepair_dupRemoved/lucigen_mp_dupRemoved_R2.fastq 21M 89,856

Note, these files should be designated as paired-end when using for assembly

These data were generated from the Lucigen NxSeq Long Mate Pair Library Kit. Reads were processed as described on page 57 and 58 of the NxSeq manual.

Summary of data processing

Lucigen Mate Pair Post Sequencing Filter Steps

  1. Stats: 1099799 reads processed, 1007381 true mate reads ( 91 %) and 92416 non-mates/chimeras ( 8 %), 2 mates too short to keep after trimming
  2. Final usable output = R1_IJS7_mates_ICC4_SW023_S60_L001_R1_001.fastq and R2_IJS7_mates_ICC4_SW023_S60_L001_R2_001.fastq

FastQC analysis

Fastqc indicates that there are multiple technical problems with the reads, beyond the usual decrease in quality scores at the ends of reads. For example, most of these reads are only about 30bp long, when they are supposed to be 300bp long.

There are also unusual sequence duplication levels and abnormal k-mer content at the ends of reads.

FastQC results Lucigen mate pair R1

FastQC results Lucigen mate pair R2

Insert size distribution

The distribution of insert sizes for inward facing, outward facing, and same strand reads is shown below. Mate pairs should be outward facing.

To generate this distribution, mates pairs were mapped to all the soapdenovo “run 1” contigs using bwa. The orientation of reads was pulled from the resulting sam file using a script from the Green lab.

Duplicate Removal with Fastuniq

Using Fastuniq to remove duplicates decreased the number of reads significantly ~217,000 to ~90,000

Fastqc

lucigen_mp_dupremoved_r1.pdf

lucigen_mp_dupremoved_r2.pdf

You could leave a comment if you were logged in.
data_overview/2015/lucigen_mate-pair_data.txt · Last modified: 2015/07/16 11:49 by ceisenhart