This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
lecture_notes:04-28-2010 [2010/05/02 05:01] galt |
lecture_notes:04-28-2010 [2010/05/02 05:22] galt |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | John St. John's lecture on EULER-SR and Celera; Michael Cusack's lecture on MIRA | ||
+ | |||
=== Misc Notes: === | === Misc Notes: === | ||
Line 12: | Line 14: | ||
- | Kevin | + | Kevin mapped newbler to join the contigs |
- | I mapped newbler to join the contigs | + | |
found a bug in the python script to map the solid reads. | found a bug in the python script to map the solid reads. | ||
Detected because there were no joining reads | Detected because there were no joining reads | ||
Line 33: | Line 34: | ||
that can use long reads and mate-pairs. | that can use long reads and mate-pairs. | ||
- | ran well first time (it ran, at least) \\ | + | **euler-sr-assembly1/**\\ |
- | have to run it where you installed it \\ | + | Ran on 454 data with the Sanger data concatenated into one file. |
- | no makefiles \\ | + | |
+ | Have to set up env vars.\\ | ||
+ | No make install options.\\ | ||
+ | Things are mixed up.\\ | ||
+ | You have to run it where you installed it \\ | ||
+ | ${EUSRC}\\ | ||
+ | |||
+ | It ran well the first time (it ran, at least) \\ | ||
+ | |||
+ | ${EUSRC}/assembly/Assemble.pl pogreads.fasta 25\\ | ||
result: \\ | result: \\ | ||
Line 41: | Line 51: | ||
are contigs overlapping? \\ | are contigs overlapping? \\ | ||
//find out:// \\ | //find out:// \\ | ||
- | check blat_strict_match (blat alignment to reference genome) \\ | + | check contig-blat_strict_match (blat alignment to reference genome) \\ |
look for "Q name" (contigs) which match to the same "T start" positions on the reference genome \\ | look for "Q name" (contigs) which match to the same "T start" positions on the reference genome \\ | ||
//answer://yes, appear to overlap a lot – double coverage because they totally overlap | //answer://yes, appear to overlap a lot – double coverage because they totally overlap | ||
+ | |||
+ | There is one 91k contig.\\ | ||
Things to try to improve the run: \\ | Things to try to improve the run: \\ | ||
- | - longer k-mers \\ | + | - longer k-mers, increasing to 31 should be easy \\ |
- increase frequency threshold (help make up for read errors, maybe?) \\ | - increase frequency threshold (help make up for read errors, maybe?) \\ | ||
+ | - throw out the tiny contigs, reduce your cutoff. | ||
+ | |||
+ | Does have an option to do some simple quality filtering on the reads\\ | ||
+ | if quality data such as fastq is used?\\ | ||
+ | -minmult look at how many things map to this area,\\ | ||
+ | if less than this many things, throw it out.\\ | ||
+ | |||
+ | Error-correct reads, construct repeat graph,\\ | ||
+ | simplifiy repeat graph with mate-reads\\ | ||
+ | Error correction by threading.\\ | ||
+ | Tries to make minimal corrections to beginnings of reads,\\ | ||
+ | uses those to make the kmers. Later threads the full readlength through. | ||
"Error Correction via threading" \\ | "Error Correction via threading" \\ | ||
Line 55: | Line 79: | ||
- perhaps this is where it went wrong? \\ | - perhaps this is where it went wrong? \\ | ||
+ | Mate reads.\\ | ||
+ | Multiple paths of similar length are hard to disambiguate.\\ | ||
+ | You can use multiple matepairs and bootstrap analysis.\\ | ||
+ | Use the paths with the highest probability.\\ | ||
+ | |||
+ | Pog repeats aside:\\ | ||
+ | There are several large homologous regions on opposite strands\\ | ||
+ | in Pog data that are kinds of repeats. \\ | ||
+ | They are at both ends of the area that inverts.\\ | ||
+ | Inversion happens by homologous matching, then swapping by two strands.\\ | ||
+ | Like a sloppy integrase. | ||
+ | |||
+ | |||
+ | **Solid data.**\\ | ||
+ | Used the regular base-space data in colorspace_input.fa (not double-encoded).\\ | ||
Tried to run on just the SOLiD data… started on Sunday, but still running (Wed) \\ | Tried to run on just the SOLiD data… started on Sunday, but still running (Wed) \\ | ||