|
Hey Lab Distributed Software
Software
I distribute several software programs for the
analysis of DNA sequence based population genetic data. These programs have been
developed over the years to suit my needs and those of people working in my lab.
All were written in C and the source code is
available. The programs should compile under different compilers. A
Win32 executable version (.exe file) is also available for each program.
The programs are a little bit interfunctional. SITES will generate input lines
for the HKA and WH programs. The FPG program, in addition to its primary
function, generates simulated data sets which can be read by SITES.
The programs can be freely distributed so long as no fee is charged for them.
IM and IMa - UPDATED
5/6/2009 (IMa had a bug dealing with loading Hapstrs with multiple STRs)

IM is a program, written
with Rasmus Nielsen, for the fitting of an isolation model with migration to
haplotype data drawn from two closely related species or populations. IM is based on a method
originally developed by Rasmus Nielsen and John
Wakeley (Nielsen and Wakeley 2001 Genetics 158:885). Large numbers of loci
can be studied simultaneously, and different mutation models can be used.
IMa implements the same Isolation with Migration
model, but does so using a new method that provides estimates of the joint
posterior probability density of the model parameters. IMa also allows log
likelihood ratio tests of nested demographic models. IMa is based on a
method described in
Hey and Nielsen (2007 PNAS 104:2785–2790). IMa is faster and
better than IM (i.e. by virtue of providing access to the joint posterior
density function), and it can be used for most (but not all) of the situations
and options that IM can be used for.
View the
Inroduction to IM and IMa
Documentation
View the Using IM
Documention
View the Using
IMa Documentation
Get the IM Distribution package
- updated w/ significant bug fixes to IMa on 5/6/2009
Questions?? - try using the
Isolation with
Migration Discussion Group . This way common questions can
be addressed by searching and discussion, and I can more easily manage my own
communications about these topics.
http://groups.google.com/group/Isolation-with-Migration
References:
Hey, J., and R. Nielsen. 2007. Integration within the Felsenstein equation for
improved Markov chain Monte Carlo methods in population genetics. PNAS
104:2785–2790.
Hey, J. 2005. On the Number of New World Founders: A Population Genetic Portrait
of the Peopling of the Americas. PLoS Biol 3:e193.
Won, Y. J., and J. Hey. 2005. Divergence population genetics of
chimpanzees. Mol Biol Evol 22:297-307.
Hey, J., and R. Nielsen. 2004. Multilocus methods for estimating
population sizes, migration rates and divergence time, with applications to the
divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167:747-760.
Hey, J.,
Y.-J. Won, A. Sivasundar, R. Nielsen and J. A. Markert. 2004 Using nuclear
haplotypes with microsatellites to study gene flow between recently separated
populations. Molecular Ecology 13: 909-919
Nielsen, R., and J. Wakeley. 2001. Distinguishing migration from isolation. A
Markov chain Monte Carlo approach. Genetics 158:885-96.
IM is subject to updates --
Click to email if you want to be notified of updates or news.
SIMDIV is the program used in the Hey lab for
generating data sets under an isolation with migration model. It can be used for
multiple populations for a variety of mutation models – pretty much for any kind
of data set for which IM and related programs can be used. SIMDIV is not the
only program that can do this. There are lots of other coalescent simulators out
there, and one that I know can be used for isolation with migration models is
SIMCOAL, but there may be others. SIMDIV will simulate data with recombination
if the users so specifies, even though IM programs assume no recombination. The
program can handle up to 10 populations, with 350 gene copies per locus, for any
number of loci. If desired a user can load all of the specifics associated with
a real data set (# loci, sample sizes, mutation models, and IM parameter
estimates) so that IM/SIMDIV results can be compared with the real data that was
used to generate the parameter estimates. For IM users SIMDIV should be fairly
easy to use because parameters mean the same things and are scaled the same way
in SIMDIV and the IM programs.
View
SIMDIV documentation
Download
SIMDIV package
SITES
is a computer program for the analysis of comparative DNA sequence data.
Basic analyses include: data summaries by polymorphism class; polymorphism
estimates within and between groups (species); estimates of migration, neutral
model, and recombination parameters; and linkage disequilibrium analyses.
SITES is primarily intended for data sets with multiple closely related sequences.
It is especially useful when multiple sequences have been obtained from
each of one or several closely related populations or species.
View
SITES documentation
SITES Downloads
HKA
is a computer program that carries out the widely used statistical test for
natural selection that was developed by Hudson, R. R., M. Kreitman and M.
Aguadé (1987 A test of neutral molecular evolution based on nucleotide data.
Genetics 116: 153-159). This program can handle very large numbers
of loci and sample sizes, and conducts tests via coalescent simulation as well
as by the conventional chi square approximation. The simulations
can also be used to conduct other tests of natural selection, including tests
of Tajima's D statistic (1989) and the D statistic of Fu and Li (1993).
View
HKA documentation
HKA Downloads
WH
is a computer program that carries out the fitting of a speciation model, and
conducts tests of the quality of fit of that model. The speciation model
is called the Isolation Model, and is one without gene flow. With
comparative DNA sequence data from each of two closely related species, the
method allows an estimation of the time since speciation and the size of the
ancestral species. The methods are described in Wakeley and Hey (1997)
and Wang, Wakeley and Hey (1997).
View
WH documentation
WH Downloads
FPG (for Forward Population Genetic simulation) simulates a
population of constant size that is undergoing various evolutionary processes,
including: mutation, recombination, natural selection, and
migration. The meaning of "forward" in this
context is simply that time, within the simulation, moves forward just as it
does in the real world. This is in contrast to coalescent population
genetic simulation in which time, as represented within the simulation,
proceeds back into the past. Coalescent simulations have many
advantages, but they are unwieldy if they incorporate natural selection on
multiple sites.
FPG is useful for assessing the impact of natural selection
on patterns of genetic variation. It is designed so as to be able
to approximate real world situations with fairly large population sizes and
high mutation rates over long stretches of DNA. The mutation model is an
infinite sites model, meaning that no site that is segregating in the
population can receive another mutation. The simulation accommodates
neutral, beneficial and deleterious mutations under several different fitness
models, including additive, multiplicative and epistatic fitness
models. The program generates a wide variety of analyses, including polymorphism
levels, heterozygosity (observed and expected), fixation rates, and linkage
disequilibrium - all conducted for each of several categories of mutation.
When migration in invoked, several analyses regarding population
structure are carried out..
View
FPG documentation
FPG Downloads
web page last updated
June 16, 2009
.
|