Class structure:

- Split into 5 teams of 3, each team will be responsible for 1 of 5 denovo assemblers, try to go as far as we can.

- Starting on the 20th of April, the teams will give reports to the class discussing the assembly chosen (how each assembler works and what kind of results we got). There will be another round of reports on evaluating assemblies.

- Two note takers each class

- First lectures will be on the banana slug biology. It is important to know the biology, it can directly impact the computational work.

- Depending on how much progress we make, we might be able to do gene predictions and annotations.

Challenges:

-AT rich and very big, which may make this much harder. Previous class estimated it would be 2 GB.

- Are banana slugs polyploid (many molluscs are)? Don't even know how many chromosomes there are (no karyotype).

- May be RAM limited. Will be using campus rocks cluster. For more information see: suport.soe.ucsc.edu/craigslist

  1. Some tools claim to be able to do Eukaryotic assemblies on clusters of computers none of which are very big. We have lots of 16-20 gb range machines, so tools that can disperse computation across these will be most versatile. Genome assembly is not CPU intensive, but rather requires a large amount of local memory (RAM) because the assembly requires a graph to be made, the graph takes a lot of local memory.

- This is the first terrestrial mollusc to be sequenced, so its going to be very different and annotation will be hard.

Introductions:

- Everybody introduced themselves and their backgrounds, including whether they were undergrad or grad, what year and program, comfort level with unix, and whether they have previous experience with doing genome assemblies.

Note taking:

- Keeping notes is extremely important.

- Each group will have a README file in their directory, you must record absolutely everything you do in the README file immediately as you do it. The notes will be accessible to the entire class and will relevant notes will also be posted to the wiki.

- You can create files using “make”. Alternately you could have a log shell script open that runs while you work. In addition to recording results when things go as planned, also record what happens when a program does run as expected or returns an error message.

- Every experiment must be run in its own directory, with its own README. This is partly because a given program will produce many files with generic names and it will get confused if they all end up in the same directory, plus you might accidentally overwrite other runs of the experiment if it has the same file name outputs.

Homework:

Get campus rocks account, check out sun grid engine/documentation, look at what is available (ex: in terms of memory)

Read current wiki, think about how to redesign it. Watch how to create and edit pages

Get comfortable with linux/unix (you could try http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.html)