454 Presentation by Teri Mueller
This talk was hosted by Roche. It was a general overview of some of the capabilities of the 454 platform and its bundled software. 454 website - Access to FLX updates and documentation. Register with UCSC account only.
Pre-talk comment: The
GUI is accessible by X11 or VNC.
454 Characteristics
1 bead/1 DNA fragment: filter will try to remove beads with >1 fragment (~20%)
200 cycles per FLX Titanium run. New machine does 400.
1 flow: single base. Unit of measure for pyrosequencing, as bases get added by flow.
Amount of light ~ # of bases.
First 4 bases of each sequence is the key sequence. The first 3 are used to normalize the amount of light for 1 base.
Library sequences: TCAG standard / GATC rapid. Do not mix them! There are also control sequences (did not catch them)
1 mil reads / 2 well plate. Lane masks will decrease number of reads.
Quality statistics can be viewed in the BaseQualityMetrics and QualityFilterMetrics files.
10 hr runs. Processing can take up to 80 hrs running on the default computer. This is mitigated by processing with a cluster.
37 gb per run (~28 gb of raw images)
Software
Amplicon Variant Analyzer: for specific region analysis
GS Assembler: de Novo or Reference Mapper
GS Reporter
GS RunProcessor - Image/signal processing
Formats:
.cwf: raw image
.fna: fasta sequence file, header has a specific format.
.qual: quality scores (like Phred, but offset)
.sff: standard 454 format. <sfffile -s> will split plate into runs.
Quality Filtering
Shotgun vs. Amplicon pipeline defaults
Keypass (read rejecting):
Checks for key sequence
~20% rejection expected
Dot (read rejecting):
Mix (read rejecting):
Signal Intensity (read trimming):
Primer (read rejecting):
Valley (read rejecting):
Discard scaled sum scores that are too close to the valleys between base count decision points.
Amplicon default: 4/700 0.57%
Shotgun default: 4/320 1.25%
Trim back (read trimming):
Quality score trim (read trimming):
40 base window: if error rate >1%, trim a base.
<40 bases, throws sequence away.
(Even unfiltered, quality scores will reflect low quality areas)
SFFtools can also perform screen-trimming with a screening db for known contaminants.
A good run: expect a read length mode ~500 and mean >300. ~50% should pass filters.
de Novo Assembly
All assemblers use .sff files.
3, 8 & 20kb paired end libraries.
Can accept fasta/qual reads.
15-25X coverage best.
4 bytes per read base in RAM.