Can learn species demographic info from a single genome
Not sequencing 1 genome but 2 (for a diploid organism), so can compare genomes to each other
Each is composed of a segment of the genome of an individual from previous generations
Looking further and further back you are sampling 1000s of individuals
Amount of heterozygosity is directly proportional …
Wright-Fischer model of reproduction
Finite and constant population (N)
Random mating with respect to the gene of the locus you are looking at
Non-overlapping / discrete generations
Genetic drift
Allele frequency (p) changes over generations via process of random mating
Changes till it reaches fixation (non-segregating) or extinction
More generations reduce genetic variation in a population
Rate it goes down is inversely proportional to population size (N)
lose variation faster with small N
lose variation slower with large N
Markov chain with absorbing boundary (math model)
pi(p) = p : the probability an allele with frequency p will go to fixation
Heterozygosity, H
rate of differences per base pair in the genome
can be measured extremely precisely
Ht = H0*(1 - (1/(2N)))^t : heterozygosity over time
Mutation
Adds genetic variation to a population
Works to counter allele fixation through genetic drift
Enters population at rate mu, per generation
deltaHmu = 2mu*(1 - H)
Independent of population size
Mutation - drift equilibrium
deltaH = -(1/(2N))*H + 2mu*(1 - H)
to determine stable heterozygosity, assume deltaH is 0 and solve for H (assuming mutation - drift equilibrium)
H = (4N*mu) / (1 + (4N*mu))
4N*mu is typically pretty small
becomes H ~= 4Ne*mu where Ne is the effective population size
Molecular evolution
what is rate of fixation of new mutations over evolutionary time?
2N*mu new alleles per generation, each of which starts life at frequency 1/2N
change of fixation is the allele frequency
rate of fixation per generation = number of new alleles * chance that each goes to fixation = 1/2N * 2N*mu = mu
molecular clock
PMSC
pairwise sequentially Markovian coalescent model
used to predict local time to the most recent common ancestor (TMRCA) based on local density of heterozygotes
hidden markov model where observations is diploid sequence, hidden states are discretized TMCRA, and transitions represent ancestral recombination events