The /campusdata/BME235/data/ directory on campusrocks contains the sequencing data from different organisms.
The fastq/ directory contains the illumina run converted into fastq format. Note that I do not include any reads that do not pass illumina's quality filter. Additionally I do not include any reads from the control lane from this experiment (lane 4). To convert the data from illumina's .txt output to fastq I use a c script I wrote which is installed in our bin directory with the source located here:
/programs/johnScripts/illuminaToFastq.c
Note that this c script preserves the quality score format in the _qseq file. Since the _qseq file uses phread quality scores rather than illumina/solexa, you need to be careful to make note of this when you use these fastq files in programs that may expect otherwise.
The c script takes four arguments, an illumina.txt file and its pair, and the file names of the corresponding two output .fastq files. It goes through each read and pair and checks that both pass illumina's quality filter. If one or both do not pass, then those reads are excluded. It looks like the majority of reads pass this criteria as the output fastq files are approximately the same size as the input files.
Here is an example of what a fastq read should look like:
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
Example ID line explained:
@HWUSI-EAS100R:6:73:941:1973#0/1
HWUSI-EAS100R the unique instrument name
6 flowcell lane
73 tile number within the flowcell lane
941 'x'-coordinate of the cluster within the tile
1973 'y'-coordinate of the cluster within the tile
#0 index number for a multiplexed sample (0 for no indexing)
/1 the member of a pair, /1 or /2 (paired-end or mate-pair reads only)
In the orignal files, lines are tab delimited with the following fields: