User Tools

Site Tools


lecture_notes:05-23-2011

Installing the UCSC Genome Browser

Patricia Chan gave a lecture on setting up the UCSC Genome Browser. genomebrowsersetup.pdf

UCSC Genome Browser

Requirements

  • 32/64 bit Linux/Unix system
  • CGI
  • MySQL database
  • Apache

Written in C and JavaScript (JQuery)

To install or mirror a genome browser on a new server: http://genomewiki.ucsc.edu/index.php/Browser_Installation

Where are the data?

  • MySQL Database
    • Each genome assembly has its own database
    • Most track data stored in MySQL
  • /gbdb/<DB name>
    • Each genome assembly has its own local directory
    • sequences, wiggle track data, other large data sources (.bam files)

centraldb

  • Contains all genome information.
  • centraldb.dbDb stores the information
  • centraldb.blatservers stores the info for two blat servers (one of DNA, one for protein sequences)
  • Can be renamed in .hg.conf

Track Info

  • Stored in the trackDb table in each genome's MySQL database.
  • Based on trackDb.ra files as input source.
  • The global trackDb.ra contains tracks that apply to multiple genomes.
  • Similar to building a custom track on the genome browser.

Kent Code Base

  • Required to set up the browser.
  • Latest source code is in GIT repository.
  • Need a CGI sandbox.
  • make utils in ~/kent/src
  • Binaries are installed in ~/bin/${MACHTYPE}

Browser Configuration File

  • .hg.conf
  • Contains MySQL user accounts and passwords, centraldb info, trackDb info.
  • Required by Kent applications to connect to MySQL.

Install the Genome Browser

Prepare Genome Sequences

  • Create /gbdb/newGenome directory for a new genome assembly
  • Convert genome sequences from FASTA to 2bit format.
    • faToTwoBit chr1.fa [chr2.fa …] /gbdb/newGenome/newGenome.2bit
  • FASTA input files must have UNIX LF character.

Setup Genome Database

  • create a MySQL database for the genome assembly:
    • hgsql “” -e “create database if not exists newGenome”
    • hgsql is a wrapper for passing SQL commands. The first argument is the database name.
  • create a group table for the new database
    • cd ~/kent/src/hg/lib
    • hgsql newGenome < grp.sql
  • create a chromInfo table
    • faSize -detailed chr1.fa [chr2.fa …] > chrominfo.tab
    • hgsql newGenome < ~/kent/src/hg/lib/ chromInfo.sql
    • hgsql newGenome -e 'load data local infile “chrominfo.tab” into table chromInfo;“
    • hgsql newGenome -e 'update chromInfo set fileName = ”/gbdb/newGenome/newGenome.2bit“

Make New Genome Available

  • Add an entry into the centraldb.dbDb table
  • Add an entry into the centraldb.defaultDb table
    • Set the default assembly to use.
  • Add an entry into the centraldb.genomeClade table
    • The clade the genome is associated with.
    • If the genome belongs to a clade that is not in the browser, add an entry to the centraldb.clade table.
  • Add a description of the genome in an HTML file (/gbdb/newGenome/html/description.html)
    • Free formatted.

Configuration

Track Configuration

  • Each genome database needs a trackDb table
  • the global trackDb.ra is in ~/kent/src/hg/makeDb/trackDb
  • genome specific trackDb.ra is stored in ~/kent/src/hg/makeDb/trackDb/<DB name>
    • can be stored in an alternate location.

Search Configuration

  • a hgFindSpec table is required for specifying search criteria.
  • Search criteria for each track are also loaded from trackDb.ra.

Start BLAT Server

  • To run BLAT, gfServer for each genome must be started.
  • Insert 2 records into centraldb.blatServers table.
  • Make sure port numbers are different.
  • If BLAT server is not run locally
    • rsync -v /gbdb/newGenome/newGenome.2bit blat_host:/gbdb/newGenome
  • At the host machine, start BLAT server in the background.

Automation

  • The previous steps are automated in the perl script make-browser.
You could leave a comment if you were logged in.
lecture_notes/05-23-2011.txt · Last modified: 2011/05/23 23:13 by pchan