Hey Lab Distributed Software
We distribute several software programs for
population genetic analysis. These programs have been developed
over the years to suit the needs of research in the Hey lab, as well as
for others to use.
All were written in C and the source code is available. The programs should compile under different compilers. A Win32 executable version (.exe file) is also available for most versions.
The programs are a little bit interfunctional. SITES will generate input lines for the HKA and WH programs. The FPG program, in addition to its primary function, generates simulated data sets which can be read by SITES.
The programs can be freely distributed so long as no fee is charged for them.
Isolation with Migration Visit this group for questions and discussion regarding IM, IMa, and IMa2
Jared Knoblauch and Arun Sethuraman have developed a browser-based graphical user interface for the latest IM programs. The IMGui download site is here.
Arun Sethuraman has developed a parallel version of the IMa2 program,
which is available from github
The paper describing the parallel program has been published by
Molecular Ecology Resources
IMa2 is a progam (
Sang Chul Choi and Rasmus Nielsen)
that extends the method of Hey and Nielsen (2007)
to two or more
populations. IMa2 has many additions and improvements over
Latest IMa2 distribution package - includes windows executable, documentation, example files.
Latest linux/unix/mac archive, with sourcefiles and installer.
To install the linux/unix/mac arhive:
- save the archive to a suitable directory:
- to decompress, type at the command prompt: tar zxf ima2-8.26.11.tar.gz
- move to the ima2-8.26.11 directory
- type at the command prompt: ./configure
- then: make
- the execuable (called 'IMa2') will be in the src directory
8/27/2012 fixed a but in testing nested models
8/27/2012 Fixed a bug in calculation of the geometric mean of mutation scalars when a subset of loci have mutation rates in the input file. Misc other fixes as well.
8/26/2011 Fixed a significant bug in the calculation of joint likelihoods and likelihood ratio statistics. Removed the twostep heating model (not useful). Fixed other bugs that caused some crashes. The .ti file format has changed, files generated under previous versions cannot be run with the latest program
4/12/2011 Fixed some bugs causing crashes. Also, I have removed the estimation of migration times (but not migration counts) when -p7 is invoked. It has become clear that there is an identifyability problem and that this just does not work. See also new paper by Strasburg and Rieseberg in Mol Ecol.
In the 10/13/2010 version of the program. There still seems to be a bug - see note from 7/27/2010 at
The 6/3/2010 fixes a bug that turned off one of the splitting time updates.
The 5/27/2010 update fixes some bugs in testing nested models for more than 2 populations. Also a new feature has been added - users can include just one single migration parameter in a model to cover migration between all pairs of populations. This can be useful for exploring multi-population models with smaller data sets.
The 5/10/2010 update fixes a problem with testing nested models with 2 populations.
The 4/23/2010 update fixes problems with estimating the probbility that one parameter is greater than another.
The 4/5/2010 update fixes a bug that occured when estimating the probability that one parameter is greater than another and another that caused a skip of a nested model for LLR tests
The 2/22/2010 update fixes a bug in printing TMRCA distributions, and a few other smaller things. The proposal distributions for splitting times have also been tweaked a bit, so the split times update rates will change a bit, but overall mixing should be a bit better. The genealogy update that did multiple branches at once has been removed - it just was not helping. The manual has not changed.
IMfig - updated 2/24/2012
IMfig is a program (written in Python) that generates a figure (in an ecapsulated postscript - eps - file) of an Isolation w/ Migration model that has been estimated from a data set. IMfig reads an output file generated with the IMa2 program.
back to top
IM and IMa - UPDATED 12/17/2009 (In this most recent update, the times of all migration events are used, and not just the mean migration time, when this is recorded).
back to top
IM is a program, written with Rasmus Nielsen, for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations. IM is based on a method originally developed by Rasmus Nielsen and John Wakeley (Nielsen and Wakeley 2001 Genetics 158:885). Large numbers of loci can be studied simultaneously, and different mutation models can be used.
IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models. IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790). IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.
Get the IM Distribution package - updated 12/17/2009
Questions?? Want to stay appraised of updates ?? - try using the Isolation with Migration Discussion Group . This way common questions can be addressed by searching and discussion, and I can more easily manage my own communications about these topics.
Isolation with Migration Visit this groupfor questions and discussion regarding IM, IMa, and IMa2
SITES is a computer program for the analysis of comparative DNA sequence data. Basic analyses include: data summaries by polymorphism class; polymorphism estimates within and between groups (species); estimates of migration, neutral model, and recombination parameters; and linkage disequilibrium analyses. SITES is primarily intended for data sets with multiple closely related sequences. It is especially useful when multiple sequences have been obtained from each of one or several closely related populations or species.
back to top
Linux-Ready Source code available 7/9/2010 - Thanks much to Jessica W. Leigh, Dept. Maths, Univ. Auckland.
back to top
(2/16/2010) Source code updated 2_16_2010 so it compiles more easily
HKA is a computer program that carries out the widely used statistical test for natural selection that was developed by Hudson, R. R., M. Kreitman and M. Aguadé (1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153-159). This program can handle very large numbers of loci and sample sizes, and conducts tests via coalescent simulation as well as by the conventional chi square approximation. The simulations can also be used to conduct other tests of natural selection, including tests of Tajima's D statistic (1989) and the D statistic of Fu and Li (1993).
back to top
WH is a computer program that carries out the fitting of a speciation model, and conducts tests of the quality of fit of that model. The speciation model is called the Isolation Model, and is one without gene flow. With comparative DNA sequence data from each of two closely related species, the method allows an estimation of the time since speciation and the size of the ancestral species. The methods are described in Wakeley and Hey (1997) and Wang, Wakeley and Hey (1997).
back to top
FPG (for Forward Population Genetic simulation) simulates a population of constant size that is undergoing various evolutionary processes, including: mutation, recombination, natural selection, and migration. The meaning of "forward" in this context is simply that time, within the simulation, moves forward just as it does in the real world. This is in contrast to coalescent population genetic simulation in which time, as represented within the simulation, proceeds back into the past. Coalescent simulations have many advantages, but they are unwieldy if they incorporate natural selection on multiple sites.
FPG is useful for assessing the impact of natural selection on patterns of genetic variation. It is designed so as to be able to approximate real world situations with fairly large population sizes and high mutation rates over long stretches of DNA. The mutation model is an infinite sites model, meaning that no site that is segregating in the population can receive another mutation. The simulation accommodates neutral, beneficial and deleterious mutations under several different fitness models, including additive, multiplicative and epistatic fitness models. The program generates a wide variety of analyses, including polymorphism levels, heterozygosity (observed and expected), fixation rates, and linkage disequilibrium - all conducted for each of several categories of mutation. When migration in invoked, several analyses regarding population structure are carried out..