当前位置: 首页 > 期刊 > 《基因杂志》 > 2003年第2期 > 正文
编号:10585698
Estimating Polygenic Effects Using Markers of the Entire Genome
http://www.100md.com 《基因杂志》2003年第2期
     a Department of Botany and Plant Sciences, University of California, Riverside, California 92521;, http://www.100md.com

    ABSTRACT;, http://www.100md.com

    Molecular markers have been used to map quantitative trait loci. However, they are rarely used to evaluate effects of chromosome segments of the entire genome. The original interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic effects of the entire genome because they require evaluation of multiple models and model selection. Here we present a Bayesian regression method to simultaneously estimate genetic effects associated with markers of the entire genome. With the Bayesian method, we were able to handle situations in which the number of effects is even larger than the number of observations. The key to the success is that we allow each marker effect to have its own variance parameter, which in turn has its own prior distribution so that the variance can be estimated from the data. Under this hierarchical model, we were able to handle a large number of markers and most of the markers may have negligible effects. As a result, it is possible to evaluate the distribution of the marker effects. Using data from the North American Barley Genome Mapping Project in double-haploid barley, we found that the distribution of gene effects follows closely an L-shaped Gamma distribution, which is in contrast to the bell-shaped Gamma distribution when the gene effects were estimated from interval mapping. In addition, we show that the Bayesian method serves as an alternative or even better QTL mapping method because it produces clearer signals for QTL. Similar results were found from simulated data sets of F2 and backcross (BC) families.

    THE genetic variation of a quantitative trait is controlled by the segregation of multiple genes. In classical quantitative genetics, the overall genetic variance is described by the infinitesimal model, which assumes that the number of loci is infinitely large, each with an infinitely small effect. The genetic variances of individual loci are so small that they cannot be investigated separately, but collectively via phenotypic resemblance between relatives (LYNCH and WALSH 1998 ). It has been hypothesized that the genetic variance of most quantitative traits is actually controlled by a few loci with large effects and a large number of loci with small effects. Under this hypothesis, the distribution of the gene effect across loci may be described by a negative exponential distribution (OTTO and JONES 2000 ). The effects of the major genes can be studied via segregation analysis. The numerous genes with small effects, however, still cannot be investigated individually. As a result, evaluation of the hypothesis of negative exponential distribution of gene effect appears to be impossible.

    With the advent of new molecular technology, saturated markers are being generated along the genome. Investigators are now able to investigate not only the effects of the major genes but also their locations in the genome. This is called quantitative trait loci (QTL) mapping (LANDER and BOTSTEIN 1989 ). However, QTL mapping involves multiple tests for individual loci. Only significant loci are reported. As GORING et al. 2001 (and references therein) stated that the reported QTL are almost always biased upward, they are not of much use for evaluating the distribution of the gene effect across loci. OTTO and JONES 2000 recently incorporated statistical test information into the study of QTL distribution, using a truncated negative exponential distribution. Their method actually depends on results of interval mapping of QTL.#[&-h2], 百拇医药

    Interval mapping requires multiple tests under multiple models. The test statistic becomes a function of the genome location and forms a test statistic profile after the entire genome has been searched. Permutation tests (CHURCHILL and DOERGE 1994 ) or other means of multiple test adjustment (PIEPHO 2001 ) are required to control the genome-wise type I error rate at a desired level. Upon completion of the genome search, the QTL effects need to be compiled and the total variance explained by the detected QTL needs to be calculated. However, QTL effects are estimated from different models. As a result, some inconsistency may often occur, such as the total variance explained by the QTL is too high. In addition, multiple estimates of the residual variance are generated and choosing the proper one for calculating the total phenotypic variance has become a problem. Models that include multiple QTL have been developed (SILLANPAA and ARJAS 1998 ; KAO et al. 1999 ). With these models, the problems of multiple tests and variance evaluation have been eliminated, but a new problem has been introduced with regard to model selection. A few issues need close attention in model selection. The criteria of deleting and inserting a QTL may be arbitrary. The sampling space of possible models (with different combinations of presence and absence of each putative QTL) may be too large to be fully explored. In addition, model selection will also cause biased estimates of gene effects if a single model is selected as the final model, although the biases can be reduced using the Bayesian method where several models are considered (SATAGOPAN et al. 1996 ). These problems have not been fully resolved. Any problems occurring in interval mapping will devaluate the significance of the work by OTTO and JONES 2000 .

    In this study, we propose a method for simultaneously evaluating marker effects of the entire genome. By marker effect, we mean the QTL effects associated with markers. If the marker density is relatively high, most of the QTL effects will be picked up by the markers and the results may be used to evaluate the distribution of gene effect across the genome. Hereon, we use the words QTL effect and marker effect interchangeably. Two problems are associated with simultaneous evaluation. One is how to handle the large number of markers in a single model. The other is how to deal with the markers with close-to-zero effects. We handle these problems by using a Bayesian method under the random regression coefficient model. In the Bayesian framework, each gene effect is assigned a normal prior with mean zero and a unique variance. The effect-specific prior variance is further assigned a vague prior so that the variance can be estimated from the data. This approach is analogous to the Bayesian method of MEUWISSEN et al. 2001 for BLUP prediction of gene effects in outbred populations.@1, 百拇医药

    METHODS@1, 百拇医药

    Linear model:@1, 百拇医药

    Let yi for i = 1, ... , n be the phenotypic value of the ith individual in a mapping population with only two segregating genotypes, e.g., a backcross (BC) or a double-haploid (DH) population. The linear model for yi is@1, 百拇医药

    where b0 is the population mean, p is the total number of markers in the entire genome, xij is a dummy variable indicating the genotype of the jth marker for individual i, bj is the QTL effect associated with marker j, and ei is the residual error with a N(0,(Shizhong Xu)