To Do (lecture notes)

We went over some things that need updating in the wiki and plans for this weekend.

Document scripts added in bin folder.

Find out which lanes in 454 run 3 are banana slug runs. But it turns out that this is not necessary: run “3” is not a separate run, but just run2 plus the non-banana-slug reads in other lanes.
- Try mapping with Newbler or BWA.
- BLAST/BLAT it.
Find insert lengths for the SeqPrep + Quake corrected Illumina data.
SOAPdenovo assembly try 1:
- Only on Illumina data.
try 2:
- Rank 1: All Illumina data as rank 1.
- Rank 2: 454 data as rank 2 (both for contig building and scaffolding).
try 3:
- Illumina + 454 data as rank 1.
If insert length is negative, don't treat them as PE reads.
- This was an error in assumptions about what SOAPdenovo wants. The number it wants is the total fragment length, which is what we are already estimating. We just need to look at the average length for the pairs (based on the histogram) after SeqPrep.
- If Quake changes the distribution of reads (by trimming and discarding uncorrectable reads) it may be important to remap the new set of pairs to the 454 reads to get an improved estimate of fragment length.
Newbler assemblies—no need for a new one, as there is no new 454 data.
Take another look at Barcode of Life mapping: Invertebrate mitochondrial translation table
- It turns out that blastall does not seem to have any documented way to tell tblastx to use a different genetic code, though the NCBI web server has the option.
- The work Kevin did so far is now in assemblies/slug/barcode-of-life.
- Next step is to use BWA to find all the SeqPrep+Quake treated Illumina reads that map to what has been found so far of the barcode.

Ed presenting paper on Wednesday. Get paper over the weekend.