**WH** -- A COMPUTER PROGRAM FOR ISOLATION MODEL FITTING--*

 

 DOCUMENTATION

 

Jody Hey

Department of Genetics

Rutgers University

Nelson Biological Labs

604 Allison Rd.

Piscataway, NJ� 08854-8082

732-445-5272

fax 732-445-5870

hey@biology.rutgers.edu

http://lifesci.rutgers.edu/~heylab

 

*Some key internal parts of this program derive from the first program for the method that was written by John Wakeley. 

 

This computer program and documentation may be freely copied and used by anyone, provided no fee is charged for it. 

 

_______________________

Contents

_______________________

______________________

 

 

______________________

Overview ��

______________________

 

 

WH is a program that fits a simple speciation model, called the Isolation Model, to multilocus DNA sequence data sets.  The isolation model assumes the following:

 

WH implements the methods described in

 

Wakeley, J., and J. Hey, 1997  Estimating ancestral population parameters. Genetics 145: 847-855.

 

Wang, R. L., J. Wakeley and J. Hey, 1997  Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. Genetics 147: 1091-1106.

 

______________________

 

Downloadable Files������������� Return to Contents

______________________

 

 _______________________

 

Input File Format��������������� Return to Contents

_______________________

 

-          If all that is to be done is a basic fitting of the isolation model, and if no tests are to be done, then each line is required to have 9 items  - see DATA LINES below.

-          If simulations and statistical tests are to be done, then each line also requires 2 additional items � the population recombination rate estimates for each species

-          If linkage disequilibrium tests  are also to be done, then there will be an additional 6 items, for a total of 17.

 

DATA LINES

FOR EACH LOCUS, ONE LINE PER LOCUS, IN ORDER:

 

If Simulations are to be done then then each line should also have:

 

If tests of Linkage disequilibrium are to be done then each line should also have:

 

Note if there is not an LD measurement for a locus, -10  is used.

 

Note on simulations.

 

The simulation results are sensitive to the amount of recombination. In the published descriptions of these simulations (Wang, Wakeley and Hey, 1997;  Kliman et al., 2000)  we used the gamma estimator of recombination (Hey and Wakeley, 1997).  This estimator tends to have a bias such that the estimates are lower than the expected value of the parameter.  The result of having lower recombination is to raise the variance of the observations (of exclusive, shared and fixed variants) and thus to broaden the distribution of test statitics of the fit of the model to the data.  In this sense, the tests should be conservative.  However this is not guaranteed, and users may want to exam the quality of the fit between the model and their data by considering a range of recombination rates. 

 

Recombination rate estimates are sometime not available for both species.  Also they are never available for the common ancestral species. Following is the method of assignment of population recombination rates:

- The program takes 4Nc1i as input (4Nc for species 1, locus i) and then sets

        4Nc2i = 4Nc1i theta2/theta1

        4NcAi = 4Nc1i thetaA/theta1

-Obtaining  4Nc1i depends on whether one has estimates for species 1 or species 2 or both.  If only 4Nc1i is available, then that's it.  - If only 4Nc2i is available, then 4Nc1i = 4Nc2i theta1/theta2.  If both are available, then 4Nc1i = (4Nc1i + 4Nc2i theta1/theta2)/2.

_______________________

 

Running the Program������������� Return to Contents

_______________________

 

The program file should reside either in the same folder as the data file or in a folder automatically searched by the operating system.  The program can be run using command line parameters, or by simply typing the name of the program ('wh').  If command line parameters are not used, the program asks for the values of runtime parameters.

 

The user starts the program simply by going to the folder where the data file exists and typing the name of the program (e.g. 'sites') followed by theenter key. The program asks several questions about the data file and thedesired analysis. Nearly all commands and options can also be entered using command line parameters.

 

The program can be started with or without the use of instructions at the command line.

 

Without command line instructions - simply type �wh� at the prompt.

The program will ask for basic information.

 

On a PowerPC, clicking on the program icon opens a small window in which command line parameters can be entered.  The user can also just hit return at this point and the program will request runtime parameters.

 

Command Line Parameters:

Type and enter 

wh -d'datafilename' -r'resultsfilename' -N'numsims' -L'ldtype' -A'ranseed'

 

Where:

    

_______________________

 

Output������������������������� Return to Contents

_______________________

 

Output is contained in the results file.  There are three main sections:  INPUT;  MODEL FITTING RESULTS; and SIMULATION RESULTS.

 

INPUT simply lists in tabular form the data in the input file.

MODEL FITTING RESULTS lists the following:

SIMULATION RESULTS lists the following:

 

  ______________________

 

Program Limitations������������ Return to Contents

_______________________

 

For simulations, the program can only handle a total sample size for each locus of  32.  If the program is compiled under Microsoft Visual C++ (as the distributed Win32 version is) then it can makes use of a compiler extension and can handle total per locus sample sizes of 64.

 

For basic model fitting, without simulations, larger sample sizes can be used.

 

During simulations, recombination within a locus can occur only between sequence segments.  The program has been compiled with 50 segments per sequence, which should be more than sufficient for most data sets. However it is possible that this will not be sufficient for loci with long sequences and high amounts of recombination. 

 

_______________________

 

Literature Cited��������������� Return to Contents

_______________________

 

Hey, J., and J. Wakeley, 1997 A coalescent estimator of the population recombination rate. Genetics 145: 833-846.

 

Kliman, R. M., P. Andolfatto, J. A. Coyne, F. Depaulis, M. Kreitman et al., 2000 The population genetics of the origin and     divergence of the Drosophila simulans complex species. Genetics 156: 1913-31.

Wakeley, J. and J. Hey. 1997 Estimating ancestral population parameters.Genetics 145, 847-855.

 

Wang, R. L., J. Wakeley and J. Hey, 1997 Gene flow and natural selection  in the origin of Drosophila pseudoobscura and close relatives. Genetics 147: 1091-106.

 

This page was last changed September 04, 2013