BayesianInferenceofRecentMigrationRate

Bayesian Inference of Recent Migration Rates Using Multilocus Genotypes

http://www.100md.com 《基因杂志》2003年第3期

     ^a Department of Medical Genetics, University of Alberta, Edmonton, Alberta T6G 2H7, Canada2r, http://www.100md.com

    ABSTRACT2r, http://www.100md.com

    A new Bayesian method that uses individual multilocus genotypesto estimate rates of recent immigration (over the last severalgenerations) among populations is presented. The method alsoestimates the posterior probability distributions of individualimmigrant ancestries, population allele frequencies, populationinbreeding coefficients, and other parameters of potential interest.The method is implemented in a computer program that relieson Markov chain Monte Carlo techniques to carry out the estimationof posterior probabilities. The program can be used with allozyme,microsatellite, RFLP, SNP, and other kinds of genotype data.We relax several assumptions of early methods for detectingrecent immigrants, using genotype data; most significantly,we allow genotype frequencies to deviate from Hardy-Weinbergequilibrium proportions within populations. The program is demonstratedby applying it to two recently published microsatellite datasets for populations of the plant species Centaurea corymbosaand the gray wolf species Canis lupus. A computer simulationstudy suggests that the program can provide highly accurateestimates of migration rates and individual migrant ancestries,given sufficient genetic differentiation among populations andsufficient numbers of marker loci.

    IN recent decades, indirect estimates of gene flow (reviewedin SLATKIN and BARTON 1989 ) have been widely used by biologists,first with allozyme data and more recently with restrictionfragment length polymorphisms (RFLPs), DNA sequence data, microsatellitemarkers, and single-nucleotide polymorphisms (SNPs). Directestimates of migration rates based on mark-recapture or othermethods can be impractical for large populations that exchangesmall numbers of migrants because the expected number of recapturesis too low; indirect estimates of gene flow using genetic markersare often the only recourse. Commonly used indirect estimatorsof gene flow, such as 4N_em = 1/F_ST - 1, are derived on the basisof simplified models of population structure that assume constantpopulation sizes, symmetrical migration (at constant rates),and population persistence for periods sufficient to achievegenetic equilibrium (WRIGHT 1931 , WRIGHT 1969 ).xdd, 百拇医药

    The development of coalescent theory (KINGMAN 1982 ; reviewedby TAVARE 1984 ), which traces the ancestral genealogy of asample rather than modeling changes of gene frequencies in thepopulation as a whole, has allowed less restrictive models tobe used in developing indirect estimators of gene flow. Thenew methods accommodate recent population expansions, nonsymmetricalmigration, and other complexities that are typical of real biologicalpopulations (BEERLI and FELSENSTEIN 1999 , BEERLI and FELSENSTEIN2001 ; VITALIS and COUVET 2001 ). However, even coalescent-basedmethods currently assume that population demography has followeda relatively simple model of either constant size or deterministicexpansion (with constant migration rates) for roughly the last4N_e generations, which is the average time until the sampledchromosomes coalesce to a most recent common ancestor (KINGMAN1982 ). For populations with large N_e or species in highly disturbedhabitats, this assumption may be unreasonable.

    Recently, nonequilibrium approaches have been proposed for identifyingmigrants (RANNALA and MOUNTAIN 1997 ; PRITCHARD et al. 2000) or hybrids between species (ANDERSON and THOMPSON 2002 ) andassigning individuals of unknown population affinity to potentialsource populations using multilocus genotypes (PAETKAU et al.1995 ; RANNALA and MOUNTAIN 1997 ; CORNUET et al. 1999 ; DAWSON and BELKHIR 2001 ; GAGGIOTTI et al. 2002 ). These methodsextract information about recent migration (within the lastfew generations) from transient disequilibrium observed at individualmultilocus genotypes of migrants or individuals recently descendedfrom migrants. In comparison with indirect estimators of long-termgene flow, these methods make relatively few assumptions, butare informative only about recent patterns of migration. Thetwo approaches (long-term gene flow and recent migration estimation)are complementary, providing information about migration ondifferent timescales. Previous methods for inferring recentmigration have focused on identifying individual migrants andtheir source populations (PAETKAU et al. 1995 ; RANNALA andMOUNTAIN 1997 ) or jointly identifying migrants and populations(PRITCHARD et al. 2000 ). Existing methods do not explicitlyestimate migration rates among populations.

    In this article, we develop a new Bayesian multilocus genotypingmethod for estimating rates of recent migration among populations.The method requires fewer assumptions than estimators of long-termgene flow and can be legitimately applied to nonstationary populationsthat are far from genetic equilibrium. Moreover, the newly proposedmethod relaxes a key assumption of previous nonequilibrium methodsfor assigning individuals to populations and identifying migrants—namelythat genotypes are in Hardy-Weinberg equilibrium within populations.We allow arbitrary genotype frequency distributions within populationsby incorporating a separate inbreeding coefficient for eachpopulation. The joint probability distribution of inbreedingcoefficients is estimated from the data. Our method also allowsfor missing genotype data by using data augmentation techniquesto integrate over possible genotypes for individuals.;}, 百拇医药

    THEORY;}, 百拇医药

    Data and model parameters:

    Consider a collection of I populations of a diploid species,with discrete nonoverlapping generations, and let m = {m_lq}be the migration rates between populations, where m_lq is thefraction of individuals in population q that are migrants frompopulation l (m can also be treated as time dependent). Assumethat some proportion of an individual's alleles originate viaa single migrant ancestor that arrived at the current (or apast) generation (this is justified for low migration rates,see Appendix A). The individual itself may also be a migrant,in which case 100% of its genome is of migrant origin. DefineM = {M_h}, where M_h is the source of migrant ancestry for individualh, and t = {t_h}, where t_h is the generation at which a migrantancestor of individual h arrived (e.g., if t_h = 0 the individualhas no migrant ancestry, if t_h = 1 the individual is itselfa migrant, etc.). M and t are then unobserved variables describingthe ancestry of each individual. To allow population genotypefrequencies to deviate from Hardy-Weinberg equilibrium we defineF = {F_l}, where F_l is the inbreeding coefficient for populationl and -1 " F_l " 1. Let p = {p_ijl} be the population frequenciesof marker alleles, where p_ijl is the frequency of allele i atlocus j in population l.

    Let X = {X_hj} be the multilocus genotypes observed at J markerloci in a random sample of n diploid individuals, where X_hjis the genotype of individual h at locus j, and let S = {S_h}identify the population source for each sampled individual,where S_h is the population that individual h was sampled from.The number of individuals sampled from the lth population isn_l. The data (observations) are X and S. The joint (and marginal)posterior probability distributions of the remaining parametersM, t, p, m, and F are estimated numerically using Markov chainMonte Carlo (MCMC) methods (GAMERMAN 1997 ). The estimated posteriorprobabilities are used to make inferences about these parameters(including point estimates). The elements of m are of primaryinterest, but other parameters, such as M and t, may also beof interest (as in RANNALA and MOUNTAIN 1997 ) and can be estimatedsimilarly.

    Likelihood:i, 百拇医药

    The likelihood of the data is the probability of the observedgenotypes given the model parameters. This isi, 百拇医药

    (1)i, 百拇医药

    wherei, 百拇医药

    andi, 百拇医药

    andi, 百拇医药

    where X_hj(1) denotes the allelepresent on the maternal chromosome, and X_hj(2) denotes the allelepresent on the paternal chromosome. Note that we define t_h =0 if M_h = S_h (i.e., if the individual has no immigrant ancestry).The likelihood presented in involves a product ofindividual genotype probabilities across marker loci and individualsbecause it is assumed that individuals are randomly sampledand the markers are unlinked.i, 百拇医药

    Prior distributions of parameters:i, 百拇医药

    To calculate the probability of observing M and t, given m,we assume that the populations are large enough that there isnegligible genetic drift over two, or three, generations (fora justification, see Appendix A). The expected proportion ofmigrants from population l that arrive in the present generation(the generation at which sampling is carried out) is then m_lqand the expected proportion of individuals with one migrantancestor from the previous generation of migration is 2m_lq (seeAppendix A). We use only first- and second-generation migrantsto estimate m_lq in this article, but more distant migrant ancestriescould also be used. The probability distribution of M and t,given m, follows a multinomial distribution,

    (2)60y;e@], 百拇医药

    where60y;e@], 百拇医药

    and60y;e@], 百拇医药

    and60y;e@], 百拇医药

    We use uninformative (uniform) Dirichlet prior densities form and p subject to the constraints60y;e@], 百拇医药

    where k_lj is the totalnumber of alleles at locus j in population l and60y;e@], 百拇医药

    We assume a uniform prior on the interval (-1, 1) for the populationinbreeding coefficient of population l, F_l.60y;e@], 百拇医药

    Posterior distributions of parameters:60y;e@], 百拇医药

    The joint posterior probability density of the model parameters,applying Bayes' theorem, is60y;e@], 百拇医药

    (3)60y;e@], 百拇医药

    The denominator of above involves high-dimensionalsums and integrals and it is not practical to evaluate it explicitlyfor samples of hundreds of individuals. Here, we use MCMC methodsto estimate the joint posterior probability density of .This requires only that it be possible for the numeratorto be evaluated; this can be done using and given above. MCMC can be carried out efficiently, even forlarge samples. Details of the MCMC algorithm are given in AppendixB.

    EXAMPLESssr, 百拇医药

    Application to data from the plant Centaurea corymbosa:ssr, 百拇医药

    The plant species Centaurea corymbosa is currently found inonly six populations in southern France. In a study by FREVILLEet al. 2001 , 228 individuals (minimum population sample sizeof 20) from these six populations were genotyped at six microsatelliteloci. This data set provides a useful test for our method, asthe genetic differentiation between most populations is large,likely as a result of limited seed and pollen dispersal (FREVILLEet al. 2001 ). While the geographical distances between thepopulations vary, all occur within a 3-km² area. Observed pairwiseF_ST values between populations ranged between 0.03 and 0.39(mean F_ST = 0.23). An assignment test performed as describedin RANNALA and MOUNTAIN 1997 assigned 91.7% of the individualsto their source population and 7.4% to a neighboring population(FREVILLE et al. 2001 ).ssr, 百拇医药

    To estimate the posterior probability distributions of parametersthe MCMC was run for a total of 3 x 10⁶ iterations, discardingthe first 10⁶ iterations as burn-in (intended to allow the chainto reach stationarity). Samples were collected every 2000 iterationsto infer posterior probability distributions of parameters ofinterest, including the population allele frequencies, migrantproportions, and individual immigrant ancestries. showsthe log posterior probability plotted against the iterationnumber for the C. corymbosa data for the first 600,000 iterations.The increase in log probability appears to plateau after only~f, 百拇医药

    500 iterations.f, 百拇医药

    fig.ommittedf, 百拇医药

    Figure A1. Several possible patterns of immigrant ancestry that would each result in either all of an individual's genes arising from an immigrant source (top of figure, = 1) or one-half of an individual's genes arising from an immigrant source (bottom of figure, = ¹/₂). Immigrants are denoted by solid circles and nonimmigrants by open circles. The probability of each pattern, given a migration rate m and assuming random mating, is given below each part.

    To further examine the convergence of the MCMC algorithm, theposterior probability density of each allele frequency at eachlocus in each population (grouped in intervals of 0.05) wascompared for two independent runs with random initial parametervalues, using either 2500 or 3 x 10⁶ iterations. The resultsare shown in and . If the two chains have converged,the relationship between their posterior probabilities shouldbe linear. The high degree of scatter in the plot of 2500 iterationsillustrates that the chains have not yet converged ().With 3 x 10⁶ iterations, the relationship is much more linear(). A similar plot of the posterior densities of the inbreedingcoefficients in two runs of 3 x 10⁶ iterations also indicatesa strong correlation between posterior probabilities estimatedfrom the two independent runs ().n, 百拇医药

    fig.ommittedn, 百拇医药

    Figure 2. Posterior probability densities of the allele frequencies generated from two separate runs of the program. The runs differed in initial random seed and initial values of m and F. (A and C) The relationship between these runs over the first 2500 iterations, before equilibrium has been reached. (B and D) The relationship between these runs after equilibrium has been reached. The latter runs consist of 3 x 10⁶ iterations, a burn-in of 10⁶, and a sampling period of 2000. Allele frequencies are grouped in 0.05 intervals.

    fig.ommitted/6[, 百拇医药

    Figure 3. Posterior probability densities of inbreeding coefficients generated from two different runs of the program. Settings are as in ./6[, 百拇医药

    The mean posterior probabilities of the immigration rates amongpopulations for the C. corymbosa data are shown in .Most populations have low migrant proportions (when averagedover the posterior probabilities) with the exception of populationE1, which appears to have a large expected proportion of migrants(m = 0.25) from population E2. There appears to be a source-sinkrelationship between the two populations because the expectedproportion of migrants into population E2 from E1 is much smaller(m = 0.00). and , presents the posterior densitiesof the frequencies of two alleles in a population with eithera low () or a high migration rate (). Populationsample sizes are nearly identical (38 and 40 individuals, respectively).Both the migration rate and the sample size affect the varianceof the posterior probability distribution; higher migrationrates and smaller sample sizes both increase the variance. In, the estimated 95% credible set of values for the allelefrequency is (0.50, 0.80) while in it is (0.55, 0.95).Migration can also cause the mode of the posterior density ofallele frequency to differ from the maximum-likelihood estimateof allele frequency that would be obtained by using the populationsample directly and ignoring immigration as is done in manypopulation assignment tests (e.g., PAETKAU et al. 1995 ).

    fig.ommitted)jo, 百拇医药

    Figure 4. Posterior probability density of a particular allele over all sampled iterations. (A) Allele 174 from locus 13D10 in population Pe. (B) Allele 163 at locus 13B7 in population E1. (C) The frequency distribution of allele 128, locus cxx140, Fort St. John population. (D) The distribution of allele 200, locus cxx204, Great Bear Lake population. The gray line represents the maximum-likelihood estimate for this allele when calculated from individuals sampled from this population. Settings for the MCMC chain are as in .)jo, 百拇医药

    fig.ommitted)jo, 百拇医药

    Table 1. Migration rates among C. corymbosa populations)jo, 百拇医药

    Another property of the populations that can be studied is theposterior probability distribution of the total numbers of nonimmigrants,first-generation immigrants, and second-generation immigrants., shows these posterior distributions for C.corymbosa population E1. The expected proportions of nonimmigrantsand first-generation immigrants overlap, although the varianceof the posterior distribution of the proportion of first-generationmigrants is lower. The expected proportion of second-generationimmigrants is about twice as high and the variance is also larger(this is likely due in part to the fact that assignments ofsecond-generation immigrants are less certain than those offirst generation). The 95% credible set for the proportion offirst-generation migrants is (0.10, 0.45) vs. (0.30, 0.75) forsecond-generation migrants and (0.00, 0.55) for nonmigrants.The reason that the probability of the proportion of nonimmigrantsbeing above 0.55 is negligible, while the migration rate intothis population is ~

    0.25 (), is outlined in Appendix A.The prior predicts that the expected proportion of first-generationmigrants should be m and the proportion of second-generationmigrants should be 2m. As no higher orders of migrants are currentlyconsidered in our method, the average proportion of nonmigrantsshould be ~wh, http://www.100md.com

    1 - m - 2m under our model, or in this case, 0.25,which falls near the center of our 95% credible set.wh, http://www.100md.com

    fig.ommittedwh, http://www.100md.com

    Figure 5. Posterior probability distribution of the proportion of the individuals in a population assigned as nonimmigrants (0), first-generation migrants (1), and second-generation migrants (2) at each sampling iteration. E1 and the Southern Richardsons are populations of C. corymbosa and wolf, respectively.wh, http://www.100md.com

    Our method can also be used to study the migrant ancestry assignmentsof individuals, taking account of overall population migrationrates and uncertain population allele frequencies. showsthe posterior probabilities of nonimmigrant, first-, or second-generationimmigrant ancestry for five individuals from population E1 andone individual from population E2. Individual 4-E1 is most likelyto be a first-generation immigrant, individuals 11-E1 and 37-E1are most likely to be second-generation immigrants, and individuals22-E1 and 31-E1 are roughly equally likely to be either nonimmigrantsor second-generation immigrants. Our method is able to identifysecond-generation immigrants with a high level of certaintydue to the linkage disequilibrium observed in the multilocusgenotypes of individuals whose parents have originated in differentpopulations. Individual 1-E2 is most likely to be a nonimmigrant.Excluding population E1, in only 3 of 190 cases did an individualassign with probability >0.05 to a population other than theone it was sampled from, indicating very low levels of migration.

    fig.ommitted{t7sch-, http://www.100md.com

    Figure 6. Posterior distribution for the assignment of individuals to ancestry states 0 ({blacksquare}), 1 (), and 2 ({square}) for C. corymbosa and wolf. All individuals are from the populations examined in , except the last C. corymbosa individual, which is from E2.{t7sch-, http://www.100md.com

    The posterior probability density of the population inbreedingcoefficient, F, was concentrated near 0 for most populations,although the standard deviation was large in population E1,which had the greatest amount of immigration; in that case,the estimated mean of the posterior density was F = 0.027 butthe standard deviation was 0.39. This is likely a result ofthe lack of information available to the method for estimatingF, as most individuals in this population have high posteriorprobabilities of being first- or second-generation migrants.The remaining populations had much lower standard deviations(<0.08). The population Pe had significant posterior probabilityassociated with relatively large positive values of F (meanof posterior density was F = 0.123 with a standard deviationof 0.05), suggesting potential local inbreeding effects.

    Application to gray wolf data:(%6, 百拇医药

    In a study of population genetic structure of gray wolves, Canislupus, in the Canadian Northwest, CARMICHAEL et al. 2001 genotypednine microsatellite loci in 491 individuals (minimum samplesize of 9 individuals) from nine separate regions. This dataset is a valuable test of our method, as the amount of differentiationbetween populations has a fairly wide range. Some populationsare situated fairly close to one another, with no obvious physicalbarriers to gene flow between them (for example, the Tuktoyaktuk/Inuvikand Paulatuk populations, F_ST = 0.009), while others are separatedby mountain ranges (Kluane National Park), the Arctic Ocean(Banks Island), or large geographic distances (Fort St. John).As such, these samples allow us to determine the effect of differencesin genetic differentiation on our method's ability to obtainreliable estimates of migration rates and individual immigrantancestries.(%6, 百拇医药

    To estimate the posterior probability distributions of the parametersthe MCMC was run for a total of 3 x 10⁶ iterations, discardingthe first 10⁶ iterations as burn-in. Samples were collectedevery 2000 iterations to infer posterior probability distributionsof parameters. shows the log-posterior probability plottedagainst iteration number for the gray wolf data. The increasein log-probability appears to plateau after ~

    10,000 iterations. and , shows the correlations (between two independentMCMC runs) of the posterior probability densities of each allelefrequency, at each locus, in each population (grouped in intervalsof 0.05). The high degree of scatter in the plot of 2500 iterationsvs. the plot of 3 x 10⁶ iterations (which is highly linear)once again illustrates that the chains have not yet convergedat 2500 iterations but have the appearance of convergence after3 x 10⁶ iterations. A similar plot () of the posteriordensities of the inbreeding coefficients in two runs, each with3 x 10⁶ iterations, also indicates a strong correlation betweenposterior probabilities (suggesting the chains have converged).-0k@1+, 百拇医药

    The means (averaged over posterior probabilities) of the immigrationrates between populations for the gray wolf data are shown in. Four of the populations appear quite isolated (BanksIsland, Fort St. John, Kluane National Park, and Northern RichardsonMountains). The remaining five populations all have at leastone major source of immigrants. There were some notably largemean migration rates between wolf populations. The mean migrationrate from the Northern Richardson Mountains to the SouthernRichardson Mountains was 0.22; from Tuk/Inuvik to Great BearLake, 0.14; from Tuk/Inuvik to Paulatuk, 0.21; and from theSouthern Richardson Mountains to Tuk/Inuvik, 0.23. All of thesepopulations are relatively close to one another, occurring onthe mainland of the northern Yukon or the Northwest Territories.However, it is worth noting that most of these populations donot have symmetrical migration rates, suggesting that movementof animals between these regions is predominantly unidirectional.For example, while the mean migration rate from the Northernto the Southern Richardson Mountains populations was 0.22, themean migration rate in the opposite direction was only 0.04.The mean migration rate from Banks Island to Victoria Islandwas also fairly large at 0.19 while the reverse rate was nearzero (see ). These islands are quite close to one anotherand are joined by ice during the winter months.

    fig.ommitted?'x, 百拇医药

    Table 2. Migration rates among gray wolf populations?'x, 百拇医药

    Fig 4C and Fig D, presents the posterior densities of thefrequencies of two alleles in populations with either a lowimmigration rate and a larger sample size or a high immigrationrate and a smaller sample size. In these examples, the samplesizes are quite different between the populations (e.g., 41individuals for Fort St. John and 22 individuals for Great BearLake). Immigration causes the mode of the distribution to exceedthe maximum-likelihood estimate by a considerable amount ()and the variance of the estimated posterior density of allelefrequency is also much larger in the example with a smallersample size and higher migration rate. In the estimated95% credible set for the allele frequency is (0.35, 0.60) whilein it is (0.10, 0.70).?'x, 百拇医药

    , shows the posterior probability distributionsof the total proportions of nonimmigrants and first- and second-generationimmigrants (from any population) for the Southern RichardsonMountains gray wolf population. The mode of the posterior proportionof nonmigrants is much lower than that for the posterior distributionof the proportion of either first- or second-generation migrants.Also, the mode of the posterior distribution of second-generationmigrants is roughly twice that of first-generation migrants.The variance of the posterior distributions of first- and second-generationmigrant proportions is much greater than that of the nonmigrantproportion. The 95% credible sets for the former are (0.20,0.50) and (0.40, 0.70), respectively, vs. (0.00, 0.20) for thelatter.

    shows the posterior probabilities of nonimmigrant, first-,or second-generation immigrant ancestry for four individualsfrom the Southern Richardson Mountains population. IndividualMP9205 is most likely to be a nonimmigrant. Individual MP9224is most likely to be a first-generation immigrant, individualMP9219 a second-generation immigrant, and individual MP9220is fairly evenly split between being a first- and second-generationimmigrant. The posterior probability density of the populationinbreeding coefficient, F, was concentrated near 0 for mostpopulations, with the exception of two populations, Great BearLake and Northern Richardson Mountains, which had significantposterior probability associated with negative values of F.F was also approximately uniformly distributed between -1 and+1 in the Victoria Island population, likely because most ofthe individuals in this population were assigned as migrants.]!, 百拇医药

    SIMULATION STUDY]!, 百拇医药

    Simulation methods:

    To evaluate the statistical properties of the new method wesimulated samples from populations exchanging migrants accordingto the WRIGHT 1931 island model (at stationarity). The allelefrequencies (assuming biallelic loci) in pairs of populationsreceiving migrants from a common source, with allele frequencyq_i at locus i, were simulated from the stationary probabilitydensity function (pdf) under the Wright island model. The simulatedmarkers could be SNPs, for example, which are typically biallelic.The pdf of the allele frequency at locus i in population j is:iqy9o, 百拇医药

    (4):iqy9o, 百拇医药

    The pdf of the allele frequencies at J unlinked loci in populationi is f(p_j) = {Pi}:iqy9o, 百拇医药

    _if(p_ij), where the product is over the J loci andp_j = {p_ij} is the vector of allele frequencies in populationj. The alleles at each locus were therefore simulated as independentand identically distributed with common pdf given by .A sample of n individuals was generated from each simulatedpopulation according to the multinomial sampling distributionof . It was assumed that (recent) migration occursbetween the two populations with rates m₁₂ and m₂₁. To reducethe number of parameters to be considered in our simulations,we assumed that m = m₁₂ = m₂₁ and q_ij = q for all i, j.

    If an individual is a nonmigrant, the genotype is generatedby assigning alleles according to the Hardy-Weinberg proportions,conditional on the simulated allele frequencies in the populationfrom which the individual was sampled. A first-generation migrantsimilarly has its genotype assigned according to Hardy-Weinbergproportions, but conditional on the allele frequencies in thealternative population. A second-generation migrant has itsgenotype assigned by drawing an allele from each population,respectively, at each locus. To simplify the comparisons, wedefine the population allele frequencies in terms of F_ST byusing the standard result for the expected F_ST at stationarityunder the Wright model, F_ST = 1/(4Nm + 1), and solving for 4Nmin terms of F_ST to obtain 4Nm = 1/F_ST - 1. The right-hand sideof this equation was substituted for 4Nm in . Thesimulation results are therefore presented in terms of F_ST,m, q, and n. To evaluate the statistical performance of theestimator of migration rates under the simulations we focusedon two statistics, the mean square error (MSE) and the bias(see CASELLA and BERGER 1990 ). MSE is a function of both thebias and the variance of the estimator (MSE = bias² + variance).A decrease in MSE therefore indicates an improvement in theestimator. To evaluate the statistical accuracy of migrant ancestryassignments we examined the proportion of migrants from eachancestral class (e.g., nonmigrants, first-generation migrants,and second-generation migrants) that were assigned to a givenclass with maximum posterior probability.

    To examine the performance of the model under various conditions,different values were assigned to a number of parameters. Themost common allele in a population (q) was assigned a valueof either 0.5 or 0.9. The number of individuals sampled fromeach population (n) was either 20 or 100. Populations were separatedby F_ST values of 0.01, 0.10, or 0.25. Migration rates betweenpopulations (m) were 0.01, 0.05, 0.10, or 0.20. Three differentnumbers of loci were simulated: 5, 10, and 20. The parameterslisted above were used for simulations in all possible combinations,for a total of 144 parameter combinations. Each of these combinationswas replicated 10 times. As each simulated data set containedtwo populations, data were generated for 20 simulated populationsfor each combination of parameter settings. The MCMC was runwith the same settings (number of iterations, etc.) as in eachof the examples. As the results with q = 0.5 were very similarto those obtained with q = 0.9, only the former are examinedhere.

    Simulation results:z\kk:1g, 百拇医药

    The results of the simulation study are summarized in . shows the influence of the numberof loci and the migration rate used for the simulations on MSEand bias of the estimated migration rate for a fixed degreeof genetic differentiation (F_ST = 0.25). In the case of 5 loci(), the data have little influence on the estimates, bycomparison with the influence of the prior. The prior specifiesthat m is uniform on the interval (0, 0.33) with mean 0.167.When the actual value of m exceeds the mean of the prior (e.g.,when m = 0.2), the estimator has a negative bias. When the actualvalue of m is less than the mean of the prior (e.g., m " 0.1)the estimator has a positive bias, as expected if the posterioris essentially similar to the prior. With 20 loci, the datahave a greater influence than the prior and we see a smallerpositive bias for all values of m considered (). In general,MSE decreases with an increase in the number of loci sampled( and ) and with increasing sample size, althoughsample size appears less important in this case.

    fig.ommittedfz5!|^, http://www.100md.com

    Figure 7. MSE and bias for the migration rate estimate from simulated data. The following parameters were used for data simulation: 5 (C and D) or 20 (A and B) loci, 20 () or 100 ({blacksquare}) individuals per population, and migration rates of 0.2, 0.1, 0.05, or 0.01, when F_ST = 0.25.fz5!|^, http://www.100md.com

    fig.ommittedfz5!|^, http://www.100md.com

    Figure 8. MSE and bias for the migration rate estimate from simulated data. The following parameters were used for data simulation: F_ST = 0.01 and 5 loci (A and B) or F_ST = 0.10 and 20 loci (C and D). Simulations were performed with either 20 () or 100 ({blacksquare}) individuals per population and migration rates of 0.2, 0.1, 0.05, or 0.01. MSE and bias for the prior ({square}) are also given.fz5!|^, http://www.100md.com

    fig.ommittedfz5!|^, http://www.100md.com

    Figure 9. Mean and variance of the maximum posterior probability for each individual migrant ancestry from simulated data. The following parameters were used for data simulation: F_ST = 0.25 and 20 loci (A and B) or F_ST = 0.01 and 5 loci (C and D). Simulations were performed with either 20 () or 100 ({blacksquare}) individuals per population and migration rates of 0.2, 0.1, 0.05, or 0.01.

    fig.ommitted\3, http://www.100md.com

    Figure 10. The proportion of individuals with a migrant ancestry of 0 ({blacksquare}), 1 (), and 2 ({square}) (size of the vertical bar) who have their maximum posterior probability in each state (proportion of the bar shaded). Data were simulated from a population F_ST of 0.25. Simulations were performed with a migration rate of 0.2 (A–C) or 0.05 (D), 100 (A and B), or 20 (C and D) individuals, and 20 (A) or 5 (B–D) loci.\3, http://www.100md.com

    It is apparent from our simulation analyses that the effectsof sampling either more individuals or more loci are correlated.With a small number of loci, increasing the sample size (from20 to 100) has little effect on the bias or MSE of the estimatedmigration rate ( and ), but with a larger numberof loci (20 loci), increased sample size dramatically reducesbias and MSE ( and ).\3, http://www.100md.com

    The migration rate and the level of genetic differentiationbetween populations also influence the mean (and variance) ofthe maximum posterior probabilities (i.e., the highest posteriorprobability assignment) of individual migrant ancestries. Inthe case of a high degree of genetic differentiation betweenpopulations (F_ST = 0.25) and 20 loci, the mean of the maximumposterior probability assignment (across sampled individuals)increases with decreasing migration rate and the variance ofthe maximum posterior probability (across individuals) decreases( and ). In the case of low genetic differentiationbetween populations (F_ST = 0.01) and 5 loci the migration ratehas little influence on the mean or variance of the maximumposterior probability assignments ( and ).

    examines the accuracy of the individual migrant ancestryassignments as a function of migration rate, sample size, andnumber of loci when populations with a high degree of geneticdivergence (F_ST = 0.25) are considered. For each of the categories0 (nonmigrant), 1 (first-generation migrant), or 2 (second-generationmigrant), the total population of individuals actually belongingto that category is represented by the height of the histogrambar. Each histogram bar is then divided into three differentshades, representing the proportion of individuals actuallybelonging to that category that are assigned to each of thethree categories. If the assignments were perfectly accurate,each histogram bar would be filled with a single shade (correspondingto the migrant ancestry class represented by that histogrambar).$'4, http://www.100md.com

    Of the four cases shown in , the cases with either highmigration rate (m = 0.2) and large samples of individuals (100)and loci (20) or low migration rate and small samples of individuals(20) and loci (5) provide the most accurate assignments (and ). Decreasing the number of loci sampled from20 to 5 has a large effect in decreasing the accuracy of assignments( and ), but increasing the number of individualssampled has only a modest effect on accuracy ( and ).Finally, decreasing the migration rate also has a largeeffect, improving the accuracy of the method even when only5 loci and 20 individuals are sampled ( and ).At least part of the explanation for this trend is the factthat with lower migration rates population allele frequenciesare more accurately estimated (due to the larger proportionof nonmigrants in the sample).

    In conclusion, although it is impossible to generalize becauseof the enormous number of possible parameter combinations thatcan occur, our simulations suggest that with five or fewer lociand low migration rates very little information is availablefor inferring migration rates; increasing the number of individualssampled has a modest effect in improving estimation except incertain cases, such as with low migration and a high degreeof genetic differentiation among populations. A higher levelof genetic differentiation among populations results in improvedaccuracy of estimated migration rates and migrant ancestry assignments.Migrant ancestries are most accurate when either a large numberof loci and individuals are sampled or migration rates are low.#df;]m, 百拇医药

    DISCUSSION#df;]m, 百拇医药

    In this article, a new Bayesian method is presented for usewith allozyme, microsatellite, RFLP, or SNP multilocus genotypedata, which allows one to simultaneously infer recent migrationrates, population allele frequencies, population inbreedingcoefficients, individual migrant ancestries, and other parametersof potential interest. Our method should be of interest to ecologistsassessing the relative importance of specific patterns of populationdynamics in nature, the prevalence of male- (or female-) biaseddispersal, the importance of geographic barriers to dispersal,and so on.

    We have applied our method to two previously published microsatellitedata sets for plants (C. corymbosa) and mammals (gray wolves)to illustrate its use. We have shown that for each of thesedata sets reasonably precise information about recent migrationpatterns can be extracted. In the case of the C. corymbosa data,a highly asymmetrical pattern of immigration in one pair ofpopulations (E1 and E2) supports the existence of a source-sinkpopulation structure./\|.@)?, http://www.100md.com

    Another pattern observed in both example analyses is that agreater proportion of individuals in populations with ongoingmigration have more distant migrant ancestry (e.g., second-generationvs. first-generation migrant ancestry). This is as expectedunder the low migration rate approximation presented in AppendixA. If migrants beyond the first generation are ignored in anassignment test the result may be biased so that individualswith second-generation migrant ancestry are incorrectly assignedas first-generation migrants. It was also observed that estimatedpopulation allele frequencies could deviate considerably frommaximum-likelihood estimates (observed proportions of alleles)in populations experiencing high rates of immigration; it istherefore important to simultaneously estimate individual migrantancestries and population allele frequencies as we have donein this article. Failing to do so may increase the likelihoodthat an immigrant individual is incorrectly assigned as a nonimmigrantdue to incorrect estimation of the allele frequencies withinpopulations. This study therefore suggests that it may be preferableto estimate migration rates, migrant ancestries, and allelefrequencies simultaneously in population assignment tests.

    The results of our limited simulation study indicate that veryaccurate estimates of migration rates and individual migrantancestries can be obtained when levels of genetic differentiationamong populations are large, migration rates are low, and 20or more loci are examined. If 5 or fewer loci are examined littleinformation may be available, even if a large number of individualsare sampled. To explore the robustness of migration rate estimatesand assignments for particular data sets, it may be advisableto carry out preliminary simulations to determine the expectedaccuracy of the method, given the observed level of geneticdifferentiation among populations. In our simulation study,we considered only diallelic loci and it is likely that accuracymay increase with increasing numbers of alleles (e.g., withmicrosatellite loci vs. SNPs).0/;*c, 百拇医药

    There are a number of ways in which the approach presented herecould be extended in the future. First, we have ignored preexistingpatterns of genetic differentiation among populations; our population-specificinbreeding coefficients consider only identity by descent (IBD)of alleles (making up genotypes) within populations. One couldtake direct account of population structure by introducing additionalF-statistics that describe the probabilities of IBD of allelessampled from different populations (in the case of individualswith mixed migrant ancestry). This could improve performancebecause the allele frequencies in populations with low levelsof differentiation are not independent and genotype sample informationcan therefore be effectively "combined" across populations (throughthe use of F-statistics) to provide improved estimates of allelefrequencies (in the extreme case, imagine two populations withno differentiation; a sample from the first population can beused to estimate allele frequencies in the second).

    Another extension of our approach could be to allow immigrationrates to vary over time. Posterior probabilities under modelswith constant or variable immigration rates could then be compared,using predictive posterior probabilities (see BERNARDO andSMITH 2000 ) to test the hypothesis of constant immigrationrates during the last few generations. This might potentiallyallow one to directly address the relationship between immigrationand gene flow. Strictly speaking, gene flow involves both immigrationand local reproduction. If the rates of migration in the currentand previous generations are similar this suggests that thereis no difference in breeding success between residents and migrants(gene flow equals immigration rate), etc.}(7, 百拇医药

    A disadvantage of our method, as currently formulated, is thatit allows only the proportions of immigrants in a populationto be estimated; it does not allow one to estimate directlythe total proportion of individuals that emigrate from a populationor the proportion that emigrate from one particular populationto another. For example, a small population may have a largeproportion of the total individuals in the population migratingto a particular large population but the fraction of migrantsdetected in the large population will be low (because of therelative difference in the population sizes) and will provideno indication of the large proportion of actual emigration fromthe source population. One way to deal with this would be toestimate emigration rates that are corrected for the relativepopulation sizes (if known). Alternatively, if temporal sampleswere collected, with replacement, information from unique individualgenotypes (or mark-recapture tagging) could be combined witha method such as ours to jointly estimate population sizes andrelative migration rates. Another assumption of the method isthat all populations exchanging migrants have been sampled.The effect of this assumption may be important and a goal offuture research should be to devise models that allow some degreeof migration from "unobserved" populations for which no referenceallele frequencies are available. This could likely be doneusing the flexible MCMC framework presented here.

    To derive the prior probability distribution for individualimmigrant ancestries, we have considered the distribution ofmigrant ancestries in a population in the limit of low migrationrates and random mating between migrants and residents. Simulationstudies are needed to determine the robustness of this approximationin the face of high migration rates and local inbreeding amongmigrant founders. Despite some outstanding issues of interpretationand reliability, methods for estimating recent migration ratesusing multilocus genotypes, such as we have presented here,should provide a useful (and complementary) alternative to existingmethods, on the basis of diffusion approximations or coalescenttheory, aimed at estimating historical migration rates underparticular demographic scenarios. On balance, we are optimisticthat new methods for inferring contemporary migration ratesand gene flow will ultimately require fewer assumptions andwill yield information that is highly relevant to conservationbiologists, ecologists, human geneticists, and others dealingwith practical problems involving recent (or ongoing) migrationand admixture among study populations.

    The program BayesAss, written in C, is available from our websiteat .$akz7, 百拇医药

    ACKNOWLEDGMENTS$akz7, 百拇医药

    We thank Helene Freville and Isabelle Olivieri for providingus with their plant microsatellite data and Lindsey Carmichaeland Curtis Strobeck for providing us with their wolf microsatellitedata. We are grateful to the two anonymous reviewers. This researchwas supported by the National Institutes of Health grant HG01988and Canadian Institutes of Health research grant MOP 44064 toB.R.$akz7, 百拇医药

    Manuscript received May 22, 2002; Accepted for publication December 10, 2002.$akz7, 百拇医药

    APPENDIX A$akz7, 百拇医药

    Expected migrant proportions:$akz7, 百拇医药

    Define the probability (per generation) that an individual isa migrant as m and let be the expected fraction of allelesat a locus that is derived from migrants. By enumerating allpossible patterns of ancestry (individual is a migrant, individualis a nonmigrant but both parents are migrants, etc.) that resultin a given value of , we obtain

    where the notation O(m²)denotes terms of order m² and higher. The first term in eachseries is the probability of a single migrant ancestor: In thecase of = 1 the individual is a migrant (at generation 1);in the case of = ¹/₂ the individual has a migrant parent (atgeneration 2); and in the case of = ¹/₄ the individual hasa migrant grandparent (at generation 3). Several possible ancestriesleading to = 1 and = ¹/₂ are shown in . The other termsallow possibilities such as two immigrant parents at generation2 (in the case of = 1), etc..8.:, 百拇医药

    The first term in each of the three possibilities listed above(migrant, migrant parent, and migrant grandparent) is a linearfunction of m, and the remaining terms are of order m² and higher.If m is small the higher-order terms can be neglected and weneed consider only possibilities involving a single migrantancestor at some generation (this approximation is implicitin the method of RANNALA and MOUNTAIN 1997 ). In the limitof small m, we expect a fraction m of individuals in the populationto be first-generation migrants, a fraction 2m to have one migrantparent, a fraction 4m to have one migrant grandparent, and soon. Individuals with migrant ancestry beyond parents will haveonly one-quarter of their genome derived from migrant ancestors,on average, and for smaller numbers of loci such individualswill be statistically indistinguishable from nonmigrants; wehave therefore chosen to use only the previous two generationsof migrant ancestry to estimate m, although more distant generationscould also be included with sufficient numbers of loci.

    Constant allele frequencies:d?6d8m, 百拇医药

    Assume that a Fisher-Wright population of constant size N_e receivesmigrants at rate m. The deterministic change in allele frequencyin the population due to migration in each generation isd?6d8m, 百拇医药

    where {Delta}d?6d8m, 百拇医药

    p = p₁ - p₀ is the change in the population allele frequencyin a single generation and {Delta}d?6d8m, 百拇医药

    p₀ = p₀ - p_m is the difference inallele frequency between the population that is the migrantsource and the population from which individuals are sampled.For a single diallelic locus, the measure of population differentiation,F_ST, is defined asd?6d8m, 百拇医药

    (see WRIGHT 1969 ), where is the average allele frequencyacross populations and {sigma}d?6d8m, 百拇医药

    ²_p is the variance of allele frequenciesacross populations. We can write {sigma}d?6d8m, 百拇医药

    ²_p for our pair of populationsas

    For a given value of F_ST, the difference {Delta}9, http://www.100md.com

    p takes on its mostextreme values when and the value of F_ST is then {Delta}9, http://www.100md.com

    p²₀. In thatcase, we can rewrite {Delta}9, http://www.100md.com

    p as9, http://www.100md.com

    We now have an expression for the magnitude of the change ofallele frequency (per generation) under migration pressure ina population that receives migrants from another populationwith a specified level of differentiation between the populations.If m < 0.05 and F_ST < 0.05 then {Delta}9, http://www.100md.com

    p < 0.01 and the changeof allele frequency over a few generations will be negligiblysmall. Similarly, twice the standard deviation of the allelefrequency change due to drift will be9, http://www.100md.com

    The change in allele frequency under drift will be greatestwhen p₀ = ¹/₂ and in that case If N_e > 5000 then 2 {sigma}

    _p < 0.01and the change of allele frequency over a few generations willbe negligibly small. These values define boundaries beyond whichthe approximations underlying the proposed method will be wellsatisfied. In such cases, the resulting estimates should beaccurate. The method may provide reasonable estimates for largervalues of m and F_ST (or smaller N_e) as well but the specificrange of applicability remains to be shown. Simulation studiesare needed to evaluate the performance of the method under arange of conditions.]i', http://www.100md.com

    APPENDIX B]i', http://www.100md.com

    MCMC algorithm:]i', http://www.100md.com

    The Metropolis-Hastings (MH) algorithm (METROPOLIS et al. 1953; HASTINGS 1970 ) was used to numerically calculate the posteriorprobability density of the parameters in our analyses. The basicidea is to construct a Markov chain with a stationary distributionthat is the joint posterior distribution of the parameters tobe estimated. This chain is simulated and samples from the chainare used to make inferences about joint or marginal posteriorprobabilities of parameters. The implementation of the MH algorithmused in our program has four steps at each iteration of thechain. At each step (outlined below) a particular set of parametersare potentially modified.

    Modifying population migration rates:m[;?, 百拇医药

    The matrix of population migration rates at iteration a, denotedas m[a], is modified to be m[a + 1] = m* with probabilitym[;?, 百拇医药

    The nominating function g(m*|m[a]) is as follows: Choose oneof the I ² elements of the migration matrix to be modified withuniform probability 1/I ². The migration rates are constrainedby our model such thatm[;?, 百拇医药

    It follows thatm[;?, 百拇医药

    To maintain these constraints, we used the following proposalscheme. If element l, q is chosen (l q), the proposed valueis m_lq* = m_lq[a] + z, where z is chosen on a uniform interval(-_m, +_m) with reflecting boundaries, where _m = max{0.10, 1 -m_ll}. If m_lq* > ¹/₃ or m_lq* < 0 then m_lq* is reflected backonto the interval (0, ¹/₃) by an amount m_lq[a] + z - ¹/₃ or- z - m_lq[a]. The remaining elements j q of row l are adjustedso that they sum to 1 by using the transformation

    If element m_ll is chosen, the proposed value is m_ll* = m_ll[a]+ z, where z is chosen on a uniform interval (-{delta} _m, +{delta} _m) with reflectingboundaries, where {delta} _m " ¹/₃. If m_ll* > 1 or m_ll* < ²/₃ thenm_lq* is reflected back onto the interval (²/₃, 1) by an amountm_lq[a] + z - 1 or ²/₃ - z - m_lq[a]. The remaining elements areadjusted to sum to 1 using the transformationw%g-t, 百拇医药

    We assumed a uniform Dirichlet prior for m and a uniform prior(on the integers 0, 1, 2) for t_i so that the terms in the MHratio involving the priors for m and t cancel. The nominatingfunction g(m*|m[a]) described above is symmetrical so that theseterms also cancel from the MH ratio.w%g-t, 百拇医药

    Modifying individual migrant ancestries:w%g-t, 百拇医药

    The matrix of individual migrant ancestries at iteration a,denoted by the composite parameters M[a] and t[a], are modifiedto be M[a + 1] = M* and t[a + 1] = t* with probability

    where-f|o50+, http://www.100md.com

    The nominating function g(M*, t*|M[a], t[a]) is as follows:Choose one of the n sampled individuals to have its migrantancestry modified with uniform probability 1/n. There are 2I- 1 possible states for the migrant ancestry of the chosen individual;it can be a nonmigrant or a first- or second-generation migrantfrom one of the remaining I - 1 populations. The proposed changefor an individual must be to one of the 2I - 2 states otherthan its present state and each possibility is assigned a uniformprobability 1/(2I - 2).-f|o50+, http://www.100md.com

    Modifying population allele frequencies:-f|o50+, http://www.100md.com

    The matrix of population allele frequencies at iteration a,denoted as p[a], is modified to be p[a + 1] = p* with probability-f|o50+, http://www.100md.com

    The nominating function g(p*|p[a]) is as follows: Choose oneof the I populations with uniform probability 1/I, choose oneof the J loci with uniform probability 1/J, and choose one ofthe k_lj alleles at locus j in population l with uniform probability1/k_lj. If allele i at locus j in population l is chosen theproposed value is p_ijl* = p_ijl[a] + z, where z is chosen ona uniform interval (-_p, +_p) with reflecting boundaries and theremaining allele frequencies are adjusted so that the proposedallele frequencies sum to 1.

    Modifying population inbreeding coefficients:4h}{&q, 百拇医药

    The vector of population inbreeding coefficients at iterationa, denoted as F[a], is modified to be F[a + 1] = F* with probability4h}{&q, 百拇医药

    The nominating function g(F*|F[a]) is as follows: Choose oneof the I populations with uniform probability 1/I. The proposedvalue is F_l* = F_l[a] + z, where z is chosen on a uniform interval(-{delta} _F, +{delta} _F) with reflecting boundaries such that F_l* remains onthe interval (-1, +1).4h}{&q, 百拇医药

    Modifying genotypes with missing data:4h}{&q, 百拇医药

    If X_- is a submatrix of X = {X_-, X₊} containing the missinggenotypes for each individual, the proposed genotypes at theseloci at iteration a, denoted as X_-[a], were modified to be X_-[a+ 1] = X_-* with probability

    The nominating function g(X_-*|X_-[a]) is as follows: Choose anyone of the L_T = L_i loci with missing data with uniform probability1/L_T, where L_i is the number of loci with missing data for individuali. Modify the locus to become genotype u, v with uniform probabilities2/[k_l(k_l - 1)] if u v and 1/k_l² if u = v where k_l is the numberof alleles (in all sampled populations) at locus l..x?@^, http://www.100md.com

    LITERATURE CITED.x?@^, http://www.100md.com

    ANDERSON, E. C. and E. A. THOMPSON, 2002 A model-based approach for identifying species hybrids using multilocus genetic data. Genetics 160:1217-1229..x?@^, http://www.100md.com

    BEERLI, P. and J. FELSENSTEIN, 1999 Maximum likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763-773..x?@^, http://www.100md.com

    BEERLI, P. and J. FELSENSTEIN, 2001 Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA 98:4563-4568.

    BERNARDO, J. M., and A. F. M. SMITH, 2000 Bayesian Theory. Wiley, New York.+85w{pw, 百拇医药

    CARMICHAEL, L. E., J. A. NAGY, N. C. LARTER, and C. STROBECK, 2001 Prey specialization may influence patterns of gene flow in wolves of the Canadian Northwest. Mol. Ecol. 10:2787-2798.+85w{pw, 百拇医药

    CASELLA, G., and R. L. BERGER, 1990 Statistical Inference. Duxbury Press, Belmont, MA.+85w{pw, 百拇医药

    CORNUET, J. M., S. PIRY, G. LUIKART, A. ESTOUP, and M. SOLIGNAC, 1999 New methods employing multilocus genotypes to select or exclude populations as origins of individuals. Genetics 153:1989-2000.+85w{pw, 百拇医药

    DAWSON, K. J. and K. BELKHIR, 2001 A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet. Res. 78:59-77.+85w{pw, 百拇医药

    FREVILLE, H., F. JUSTY, and I. OLIVIERI, 2001 Comparative allozyme and microsatellite population structure in a narrow endemic plant species, Centaurea corymbosa Pourret (Asteraceae). Mol. Ecol. 10:879-889.+85w{pw, 百拇医药

    GAGGIOTTI, O. E., F. JONES, W. M. LEE, W. AMOS, and J. HARWOOD et al., 2002 Patterns of colonization in a metapopulation of grey seals. Nature 416:424-427.

    GAMERMAN, D., 1997 Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall, New York.1rs$\(, http://www.100md.com

    HASTINGS, W. K., 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109.1rs$\(, http://www.100md.com

    KINGMAN, J. F. C., 1982 On the genealogy of large populations. J. Appl. Prob. 19A(Suppl.):27-43.1rs$\(, http://www.100md.com

    METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and E. TELLER, 1953 Equations of state calculations by fast computing machine. J. Chem. Phys. 21:1087-1091.1rs$\(, http://www.100md.com

    PAETKAU, D., W. CALVERT, I. STIRLING, and C. STROBECK, 1995 Microsatellite analysis of population structure in Canadian polar bears. Mol. Ecol. 4:347-354.1rs$\(, http://www.100md.com

    PRITCHARD, J. K., M. STEPHENS, and P. DONNELLY, 2000 Inference of population structure using multilocus genotype data. Genetics 155:945-959.1rs$\(, http://www.100md.com

    RANNALA, B. and J. L. MOUNTAIN, 1997 Detecting immigration by using multilocus genotypes. Proc. Natl. Acad. Sci. USA 94:9197-9201.1rs$\(, http://www.100md.com

    SLATKIN, M. and N. H. BARTON, 1989 A comparison of three indirect methods for estimating average levels of gene flow. Evolution 43:1349-1368.1rs$\(, http://www.100md.com

    TAVARE, S., 1984 Line of descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 26:119-164.1rs$\(, http://www.100md.com

    VITALIS, R. and D. COUVET, 2001 Estimation of effective population size and migration rate from one- and two-locus identity measures. Genetics 157:911-925.1rs$\(, http://www.100md.com

    WRIGHT, S., 1931 Evolution in Mendelian populations. Genetics 16:97-159.1rs$\(, http://www.100md.com

    WRIGHT, S., 1969 Evolution and Genetics of Populations: The Theory of Gene Frequencies, Vol. 2. University of Chicago Press, Chicago.(Gregory A. Wilson and Bruce Rannala)

百拇医药网 http://www.100md.com/html/DirDu/2005/05/05/58/53/76.htm