Software

We distribute several software programs for population genetic analysis.  These programs have been developed over the years to suit the needs of research in the Hey lab, as well as for others to use.


Programs were written in C, C++, and/or Python, and the source code is available.  The programs should compile under different compilers.  A Win32 executable version (.exe file) is also available for most versions.


Some of the programs are a little bit interfunctional. SITES will generate input lines for the HKA and WH programs. The FPG program, in addition to its primary function, generates simulated data sets which can be read by SITES.


The programs can be freely distributed so long as no fee is charged for them.


We developed an estimator of first coalescent time, which is the time when a copy of a gene most recently shared ancestry with other copies of that gene. This is also an estimate of the upper bound of when a mutation has arisen, and it can be used to study the ages of alleles that are found in a population. The method can be applied to the very rarest alleles found only once in a sample, even in studies of many thousands of genomes. The program is written in python and C++.


IMa3 is the newest in the IM sequence of programs. It can be used to solve a fundamental problem in evolutionary genetics, which is to jointly consider phylogenetic history and pouplation genetic history, including gene exchange. IMa3 can be used to estimate the rooted phylogenetic tree for multiple populations, and does so while integrating over all possible Isolation-with-Migration models. For a given phylogenetic tree IMa3 addresses the same model as IMa2. Like IMa2-p, IMa3 can run on multiple processors





IMa2 is a program (written with Sang Chui Choi and Rasmus Nielsen) that extends the method of Hey and Nielsen (2007) to two or more populations. IMa2 has many improvements and additions over previous IM programs.


For linux/mac installation, run the following commands in a new directory:


                
    tar -xzf ima2-8.26.12.tar.gz
    cd ima2-8.26.12/
    ./configure
    make
                
            



Arun Sethuraman has developed a parallel version of the IMa2 program. Available for linux and mac.


Jared Knoblauch and Arun Sethuraman have developed an Electron-based desktop graphical user interface for the latest IMa program.


IMfig is a program (written in Python) that generates a figure (in an encapsulated postscript - eps - file) of an Isolation w/ Migration model that has been estimated from a data set. IMfig reads an output file generated with the IMa3 or IMa2 program.


IM and IMaBack to top

IM is a program, written with Rasmus Nielsen, for the fitting of an isolation model with migration to haplotype data drawn from two closely related species or populations.  IM is based on a method originally developed by Rasmus Nielsen and John Wakeley (Nielsen and Wakeley 2001 Genetics 158:885).  Large numbers of loci can be studied simultaneously, and different mutation models can be used. 


IMa implements the same Isolation with Migration model, but does so using a new method that provides estimates of the joint posterior probability density of the model parameters. IMa also allows log likelihood ratio tests of nested demographic models.  IMa is based on a method described in Hey and Nielsen (2007 PNAS 104:2785–2790).   IMa is faster and better than IM (i.e. by virtue of providing access to the joint posterior density function), and it can be used for most (but not all) of the situations and options that IM can be used for.


SITES is a computer program for the analysis of comparative DNA sequence data. Basic analyses include: data summaries by polymorphism class; polymorphism estimates within and between groups (species); estimates of migration, neutral model, and recombination parameters; and linkage disequilibrium analyses. SITES is primarily intended for data sets with multiple closely related sequences. It is especially useful when multiple sequences have been obtained from each of one or several closely related populations or species.


(2/16/2010) Source code updated 2_16_2010 so it compiles more easily.


HKA is a computer program that carries out the widely used statistical test for natural selection that was developed by Hudson, R. R., M. Kreitman and M. Aguadé (1987 A test of neutral molecular evolution based on nucleotide data. Genetics 116: 153-159).   This program can handle very large numbers of loci and sample sizes, and conducts tests via coalescent simulation as well as by the conventional chi square approximation.   The simulations can also be used to conduct other tests of natural selection, including tests of Tajima's D statistic (1989) and the D statistic of Fu and Li (1993).


WH is a computer program that carries out the fitting of a speciation model, and conducts tests of the quality of fit of that model.  The speciation model is called the Isolation Model, and is one without gene flow.  With comparative DNA sequence data from each of two closely related species, the method allows an estimation of the time since speciation and the size of the ancestral species.  The methods are described in Wakeley and Hey (1997) and Wang, Wakeley and Hey (1997).


FPG (for Forward Population Genetic simulation) simulates a population of constant size that is undergoing various evolutionary processes, including:  mutation, recombination,  natural selection, and migration.   The meaning of "forward" in this context is simply that time, within the simulation, moves forward just as it does in the real world.  This is in contrast to coalescent population genetic simulation in which time, as represented within the simulation, proceeds back into the past.  Coalescent simulations have many advantages, but they are unwieldy if they incorporate natural selection on multiple sites.


FPG is useful for assessing the impact of natural selection on patterns of genetic variation.   It is designed so as to be able to approximate real world situations with fairly large population sizes and high mutation rates over long stretches of DNA.  The mutation model is an infinite sites model, meaning that no site that is segregating in the population can receive another mutation.  The simulation accommodates neutral, beneficial and deleterious mutations under several different fitness models, including additive, multiplicative and epistatic fitness models.   The program generates a wide variety of analyses, including polymorphism levels, heterozygosity (observed and expected), fixation rates, and linkage disequilibrium - all conducted for each of several categories of mutation.  When migration in invoked,  several  analyses regarding population structure are carried out.