Jody Hey                  Evolutionary Genetics

  Professor    -     Department of Genetics     -   Rutgers University

Hey Lab Research Publications Software, Data Contacts, People


 

For Hey Lab Data Sets - Go HERE

Hey Lab Distributed Software

 

 

Software

 

I distribute several software programs for the analysis of DNA sequence based population genetic data. These programs have been developed over the years to suit my needs and those of people working in my lab.

All were written in C and the source code is available.  The programs should compile under different compilers.  A Win32 executable version (.exe file) is also available for each program.

The programs are a little bit interfunctional. SITES will generate input lines for the HKA and WH programs. The FPG program, in addition to its primary function, generates simulated data sets which can be read by SITES.

The programs can be freely distributed so long as no fee is charged for them.

IM and IMa  - UPDATED 5/6/2009  (IMa had a bug dealing with loading Hapstrs with multiple STRs)

 

IM is a program, written with Rasmus Nielsen, for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations.  IM is based on a method originally developed by Rasmus Nielsen and John Wakeley (Nielsen and Wakeley 2001 Genetics 158:885).  Large numbers of loci can be studied simultaneously, and different mutation models can be used. 

IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models.  IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790).   IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.

View the Inroduction to IM and IMa Documentation

View the Using IM Documention

View the Using IMa Documentation

Get the IM Distribution package - updated w/ significant bug fixes to IMa on 5/6/2009

Questions??  - try using the  Isolation with Migration Discussion Group .   This way common questions can be addressed by searching and discussion, and I can more easily manage my own communications about these topics.

Google Groups Beta
Isolation with Migration
Visit this group

http://groups.google.com/group/Isolation-with-Migration

References:

Hey, J., and R. Nielsen. 2007. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. PNAS 104:2785–2790.

Hey, J. 2005. On the Number of New World Founders: A Population Genetic Portrait of the Peopling of the Americas. PLoS Biol 3:e193.

Won, Y. J., and J. Hey. 2005. Divergence population genetics of chimpanzees. Mol Biol Evol 22:297-307.

Hey, J., and R. Nielsen. 2004. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167:747-760.

Hey, J., Y.-J. Won, A. Sivasundar, R. Nielsen and J. A. Markert. 2004 Using nuclear haplotypes with microsatellites to study gene flow between recently separated populations. Molecular Ecology 13: 909-919

Nielsen, R., and J. Wakeley. 2001. Distinguishing migration from isolation. A Markov chain Monte Carlo approach. Genetics 158:885-96.

 

IM is subject to updates -- Click to email if you want to be notified of updates or news.


 

SIMDIV

SIMDIV is the program used in the Hey lab for generating data sets under an isolation with migration model. It can be used for multiple populations for a variety of mutation models – pretty much for any kind of data set for which IM and related programs can be used. SIMDIV is not the only program that can do this. There are lots of other coalescent simulators out there, and one that I know can be used for isolation with migration models is SIMCOAL, but there may be others. SIMDIV will simulate data with recombination if the users so specifies, even though IM programs assume no recombination. The program can handle up to 10 populations, with 350 gene copies per locus, for any number of loci. If desired a user can load all of the specifics associated with a real data set (# loci, sample sizes, mutation models, and IM parameter estimates) so that IM/SIMDIV results can be compared with the real data that was used to generate the parameter estimates. For IM users SIMDIV should be fairly easy to use because parameters mean the same things and are scaled the same way in SIMDIV and the IM programs.

View SIMDIV documentation

Download SIMDIV package


 

SITES

SITES is a computer program for the analysis of comparative DNA sequence data.  Basic analyses include: data summaries by polymorphism class;  polymorphism estimates within and between groups (species); estimates of migration, neutral model, and recombination parameters; and linkage disequilibrium analyses.  SITES is primarily intended for data sets with multiple closely related sequences. It is especially useful when multiple sequences have been obtained from each of one or several closely related populations or species.  

View SITES documentation

SITES Downloads


HKA

HKA is a computer program that carries out the widely used statistical test for natural selection that was developed by Hudson, R. R., M. Kreitman and M. Aguadé (1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153-159).   This program can handle very large numbers of loci and sample sizes, and conducts tests via coalescent simulation as well as by the conventional chi square approximation.   The simulations can also be used to conduct other tests of natural selection, including tests of Tajima's D statistic (1989) and the D statistic of Fu and Li (1993).

View HKA documentation

HKA Downloads


WH

WH is a computer program that carries out the fitting of a speciation model, and conducts tests of the quality of fit of that model.  The speciation model is called the Isolation Model, and is one without gene flow.  With comparative DNA sequence data from each of two closely related species, the method allows an estimation of the time since speciation and the size of the ancestral species.  The methods are described in Wakeley and Hey (1997) and Wang, Wakeley and Hey (1997).

View WH documentation

WH Downloads


FPG

FPG (for Forward Population Genetic simulation) simulates a population of constant size that is undergoing various evolutionary processes, including:  mutation, recombination,  natural selection, and migration.   The meaning of "forward" in this context is simply that time, within the simulation, moves forward just as it does in the real world.  This is in contrast to coalescent population genetic simulation in which time, as represented within the simulation, proceeds back into the past.  Coalescent simulations have many advantages, but they are unwieldy if they incorporate natural selection on multiple sites.

FPG is useful for assessing the impact of natural selection on patterns of genetic variation.   It is designed so as to be able to approximate real world situations with fairly large population sizes and high mutation rates over long stretches of DNA.  The mutation model is an infinite sites model, meaning that no site that is segregating in the population can receive another mutation.  The simulation accommodates neutral, beneficial and deleterious mutations under several different fitness models, including additive, multiplicative and epistatic fitness models.   The program generates a wide variety of analyses, including polymorphism levels, heterozygosity (observed and expected), fixation rates, and linkage disequilibrium - all conducted for each of several categories of mutation.  When migration in invoked,  several  analyses regarding population structure are carried out..

View FPG documentation

FPG Downloads


web page last updated June 16, 2009 .

Hit Counter