====== 454 Presentation by Teri Mueller ====== This talk was hosted by Roche. It was a general overview of some of the capabilities of the 454 platform and its bundled software. [[http://www.my454.com|454 website]] - Access to FLX updates and documentation. Register with UCSC account only. * Pre-talk comment: The GUI is accessible by X11 or VNC. ===== 454 Characteristics ===== * 1 bead/1 DNA fragment: filter will try to remove beads with >1 fragment (~20%) * 200 cycles per FLX Titanium run. New machine does 400. * 1 flow: single base. Unit of measure for pyrosequencing, as bases get added by flow. * Amount of light ~ # of bases. * First 4 bases of each sequence is the key sequence. The first 3 are used to normalize the amount of light for 1 base. * Library sequences: TCAG standard / GATC rapid. Do not mix them! There are also control sequences (did not catch them) * 1 mil reads / 2 well plate. Lane masks will decrease number of reads. * Quality statistics can be viewed in the BaseQualityMetrics and QualityFilterMetrics files. * 10 hr runs. Processing can take up to 80 hrs running on the default computer. This is mitigated by processing with a cluster. * 37 gb per run (~28 gb of raw images) ===== Software ===== * Amplicon Variant Analyzer: for specific region analysis * GS Assembler: de Novo or Reference Mapper * GS Reporter * GS RunProcessor - Image/signal processing * Formats: * .cwf: raw image * .fna: fasta sequence file, header has a specific format. * .qual: quality scores (like Phred, but offset) * .sff: standard 454 format. will split plate into runs. ===== Quality Filtering ===== * Shotgun vs. Amplicon pipeline defaults * Keypass (read rejecting): * Checks for key sequence * ~20% rejection expected * Dot (read rejecting): * Checks for too many negative flows. * 3 successive negative flows or N>5% of last positive flow. * Mix (read rejecting): * Checks for too many positive flows. * Indication of more than one sequence in bead. * >70% positive reads. * Signal Intensity (read trimming): * Reduces size of read until <3% borderline reads. * Primer (read rejecting): * Discard overamplified short sequences. * Valley (read rejecting): * Discard scaled sum scores that are too close to the valleys between base count decision points. * Amplicon default: 4/700 0.57% * Shotgun default: 4/320 1.25% * Trim back (read trimming): * Like valley, except trims instead of discarding until ratio is acceptable. * Amplicon default: off * Shotgun default: on * Quality score trim (read trimming): * 40 base window: if error rate >1%, trim a base. * <40 bases, throws sequence away. * (Even unfiltered, quality scores will reflect low quality areas) * SFFtools can also perform screen-trimming with a screening db for known contaminants. A good run: expect a read length mode ~500 and mean >300. ~50% should pass filters. ===== de Novo Assembly ===== * All assemblers use .sff files. * 3, 8 & 20kb paired end libraries. * Can accept fasta/qual reads. * 15-25X coverage best. * Low coverage: poor contig building * High coverage: may cause contig breaks * 4 bytes per __read__ base in RAM.