当前位置: 首页 > 期刊 > 《遗传学和分子生物学》 > 2005年第2期 > 正文
编号:11342382
Prediction of hybrid means from a partial circulant diallel table using the ordinary least square and the mixed model methods
http://www.100md.com 《遗传学和分子生物学》
     Escola de Agronomia e Engenharia de Alimentos, Universidade Federal de Goias, Goiania, GO, Brazil

    ABSTRACT

    By definition, the genetic effects obtained from a circulant diallel table are random. However, because of the methods of analysis, those effects have been considered as fixed. Two different statistical approaches were applied. One assumed the model to be fixed and obtained solutions through the ordinary least square (OLS) method. The other assumed a mixed model and estimated the fixed effects (BLUE) by generalized least squares (GLS) and the best linear unbiased predictor (BLUP) of the random effects. The goal of this study was to evaluate the consequences when considering these effects as fixed or random, using the coefficient of correlation between the responses of observed and non-observed hybrids. Crossings were made between S1 inbred lines from two maize populations developed at Universidade Federal de Goias, the UFG-Samambaia "Dent" and UFG-Samambaia "Flint". A circulant inter-group design was applied, and there were five (s = 5) crossings for each parent. The predictions were made using a reduced model. Diallels with different sizes of s (from 2 to 5) were simulated, and the coefficients of correlation were obtained using two different approaches for each size of s. In the first approach, the observed hybrids were included in both the estimation of the genetic parameters and the coefficient of correlation, while in the second a cross-validation process was employed. In this process, the set of hybrids was divided in two groups: one group, comprising 75% of the original group, to estimate the genetic parameters, and a second one, consisting of the remaining 25%, to validate the predictions. In all cases, a bootstrap process with 200 resamplings was used to generate the empirical distribution of the correlation coefficient. This coefficient showed a decrease as the value of s decreased. The cross-validation method allowed to estimate the bias magnitude in evaluating the correlation coefficient using the same hybrids, to predict the genetic parameters and the correlation evaluation. The bias was shown to be greater when the OLS method was used. When the correlation coefficients of the observed and estimated hybrid means were obtained through the mixed instead of the fixed model, this decrease was less marked. The selection of hybrids superior to the checks, in terms of grain weight, also differed in the two different approaches. Nineteen percent of the hybrids were shown to be superior to the checks in the fixed models, while only 1.8% of them were superior in the mixed model.

    Key words: diallel analysis, BLUP, prediction, cross-validation.

    Introduction

    Nowadays, one of the major obstacles to corn breeding programs which aim to develop hybrids is the high cost of field evaluation. The strategy originally adopted in breeding programs was to perform all possible crossings in a group of inbred lines and then make an evaluation of the single hybrids obtained, followed by the selection of the most promising ones. However, as the breeding programs became larger, thousands of inbred lines became available. This made the development and evaluation of all possible hybrids extremely difficult, mainly because of the high cost of the assessment phase. So, there was an urgent need to develop procedures to allow the evaluation of a large number of inbred lines from a small sample of hybrids. The prediction of non-observed hybrid performance became possible through the use of genetic parameter estimation. Consequently, estimators or predictors have been sought, in order to maximize the correlation between estimated or predicted genetic values and parametric genetic values. Diallel tables have been one of the main tools for estimating genetic parameters, not only because they provide great amounts of information, but also because of the flexibility in constructing them. For predictive analysis, the scheme proposed by Kempthorne and Curnow (1961), based on a sample of all possible crossings between a group of parents, referred to as a circulant diallel cross, is noteworthy. Miranda Filho and Vencovsky (1999), using Griffing's model (1956), and Reis (2000), using the model proposed by Gardner and Eberhart (1966), adjusted the circulant design to an interpopulation level. In order to achieve this, it was necessary to obtain the hybrid combinations ps/2, where p is the number of parents and s is the number of combining hybrids in each participating parent. In the second case, evaluation of the parents is also required. When comparing the complete diallel tables, the great reduction in the number of crossings is striking, mainly when there is an increase in the value of p.

    One way of evaluating the predictive capacity of the model, that uses the estimates from a circulant diallel table, is by applying the Pearson correlation coefficient between the responses of the predicted and observed hybrids. Andrade (1995), using s = 3, found correlation coefficients varying from 0.82 to 0.96. On the other hand, Araujo (2000), using s = 4, found a correlation of 0.86, and Fuzatto (2003) observed correlations between 0.685 and 0.925, using values of s from 6 to 2. The correlation increased as the value of s decreased. All these authors evaluated ear weight. Gonalves (1987), using s = 3, observed correlations from 0.92 to 0.86 related to the grain weight. In all these experiments, the genetic parameters (general and specific combining ability) were estimated by using the ordinary least squares (OLS) method.

    On the other hand, in a circulant diallel table, there is an interest in extrapolating the information obtained about the observed hybrids to a reference population of non-observed hybrids [(p/2)2-ps/2)]. As emphasized by Searle et al. (1992), the main issue is to quantify the performance of a non-realized random variable (non-observed hybrids), given an observation vector (realized observation). Therefore, in this context, according to Henderson (1986), the use of BLUP (Best Linear Unbiased Predictor) would be the most appropriate method to predict the genetic parameters. The use of BLUP in plant breeding has also been advocated by Bernardo (1994, 1995, 1996a, 1996b). In this particular case, the error variance and the other variance components will influence the genetic parameter estimation, making it possible to obtain the BLUE (Best Linear Unbiased Estimator) for fixed effects and the BLUP for random effects, which is the appropriate approach for mixed linear models (Henderson, 1984). In this method, the known covariances will be considered not only in the statistical tests, but also in the assessment and prediction of effects which directly influence the selection of the inbred lines. In general, the corresponding estimators have lower variances than the ones obtained through the OLS, thus resulting in more reliable estimation (Duarte and Vencovsky, 2001). Andre (1999) concluded that the BLUP provides better accuracy than the OLS estimators in predicting the general combining ability effects in different conditions of heritability. Besides being possible when the information about co-ancestry between the inbred lines is available, it is also possible to consider the additive effects, the dominant effects and the epistatic interactions. The main restriction found to the use of this approach is its great computational requirement, which no longer represents an obstacle.

    The purpose of this paper was to evaluate the efficiency of the mixed linear models methodology in analyzing a partial circulant table, with varying sizes of s. This evaluation was performed mainly by correlating the predicted and the observed values of the hybrids.

    Material and Methods

    Two groups of parents, 34 flint maize inbred lines S1 and 34 dent maize inbred lines S1, randomly sampled from two populations, the UFG-Samambaia flint and the UFG-Samambaia dent, were used as the experimental material. These populations were developed at Universidade Federal de Goias (EA-UFG). The crossings were performed according to a partial circulant diallel design, with five crosses for each parent (s = 5) (Table 1), where 165 out of 170 originally predicted hybrids were obtained, representing the reference population for the 1156 possible hybrids between these two inbred line groups. These hybrids were evaluated through a randomized complete block design, with four replications. The experimental plots were represented by single rows 5 m long spaced 0.9 m apart, with 25 plants per plot after thinning. The triple hybrid BR-3123 was used as a check, and planting was done on January 6, 1999, in the experimental area at the EA-UFG.

    Griffing's model (1956) was adopted to describe the observations of the diallel table:

    where: yij is the phenotypic value of the hybrids between the dent line i (i = 1, 2, ..., I) and the flint line j (j = 1, 2, ..., J); m is the mean common to the observations; gi is the general combining ability effect of the ith parent from the dent group, assumed to be random and with the distribution N ~ (0, s2LD); gj is the general combining ability effect of the jth parent from the flint group, assumed to be random and with the distribution N ~ (0, s2LF); sij is the specific combining ability effect resulting from the crossing between the parents i and j, assumed to be random and with the distribution N ~ (0, s2CEC); and eij is the random error effect with the distribution N ~ (0, s2).

    Fixed model

    In the matrix form, the hybrid means can be represented by:

    where is the mean treatment vector, X is the incidence matrix of the genetic effects, b is the parametric vector, and is the error vector. As X is an incomplete rank column matrix, X'X is singular, not having a single inverse. Therefore, in order to solve the system of normal equations and to obtain single solutions, the following parametric restrictions were adopted:

    Thus, the OLS solutions are given by:

    Mixed linear model

    The individual observations can be expressed in the matrix form as follows:

    where: y is the observation vector; q is the vector of fixed effects, which here includes the general mean and the block effect; aX is the vector for the general combining ability of the dent inbred lines; aY is the vector for the general combining ability of the flint inbred lines; d is the vector for the specific combing ability; e is the error vector; and X, Z1, Z2 and Z are the incidence matrices for vectors q, aX, aY, and d, respectively.

    In this case, applying generalized least squares (GLS) to calculate the fixed effects and the best linear unbiased prediction for the random effects, as proposed by Henderson (1984), the solutions of the mixed model equations can be obtained by:

    or:

    with:

    Using the expectation maximization-restricted maximum likelihood (EM-REML) algorithm (Dempster et al., 1977) to obtain the solution of this system, the variance component estimators are given by:

    where: p1, p2, p, and s are the numbers of flint inbred lines, dent inbred lines, total number of inbred lines and the number of crosses for each inbred line, respectively. In (6), r(X) is the rank of X, and Tr is the trace operation. As the inbred lines were considered unrelated and since the two groups are not related to each other, the matrices A1, A2 and D are identity matrices. When assuming the existence of co-ancestry between the parents, matrices A1 and A2 will present values equal to 1.0 in the diagonal and the co-ancestry coefficient between parents off the diagonal. Thus the diagonal of matrix D is also composed of values equal to 1.0, and the off-diagonal values are the products of the co-ancestry coefficients between the parents.

    An interactive process was conducted, in accordance with (6) and (5), until a convergence was obtained, attributing an initial randomized value to the variance components. As only the estimates of the variance components, and not their parametric values, were known, the EBLUP (Empirical Best Linear Unbiased Predictor) was obtained from the random effects (Littel et al., 1996). However, for the selection based on isolated traits, the rank of the candidates for selection is not as influenced by errors in the estimation of variance components (Resende, 2002), when there are balanced data and when only one population is considered (Duarte and Vencovsky, 2001).

    The diallels were simulated to evaluate the models' goodness of fit and the way that correlation is obtained, in order to make the predictions of non-observed hybrids. The s sizes ranged from 5 to 2, and the correlation was made in two different ways. First, the correlation was obtained by using the sample of observed hybrids to calculate both the parameters and the coefficient of correlation. Second, a cross-validation procedure was applied to the original data set. This set was randomly divided into two groups: one, constituting 75% of the original set, was used to estimate the genetic parameters, and the other, composed of the remaining observed hybrids (25%), was used to validate the predictions. In all cases, a process of 200 resamplings with replacement (bootstrap) was employed to generate empirical distributions of the correlation coefficient estimates.

    The OLS was used to estimate the genetic parameters from (3), and to obtain the best linear unbiased prediction, as done in (5). All analyses were performed using the SAS PROC IML system (Little et al., 1996). The program used to calculate the EBLUP was adapted from Andre (1999):

    Results and Discussion

    An increase was found in the correlation coefficient as the value of s was reduced, when the observed hybrids were used both in the estimation of the genetic parameters and of the correlation coefficient. However, the standard deviations associated with those estimates increased as s decreased. The value of the correlation coefficient decreased from 0.916 ± 0.0727 (s = 2) to 0.742 ± 0.0090 (s = 5), using the OLS, and from 0.851 ± 0.0217 (s = 2) to 0.733 ± 0.0049 (s = 5), when the mixed model was applied (Table 2).

    The opposite results were found when cross-validation was employed, that is, the value of the correlation coefficient increased as the value of s increased. Likewise, the related standard deviation decreased with the increase of s. When the OLS method was applied, the correlation coefficient varied from 0.260 ± 0.1217 (s = 5) to 0.100 ± 0.2441 (s = 2), while with the mixed model the variation ranged from 0.370 ± 0.1063 (s = 5) to 0.120 ± 0.2278 (s = 2) (Table 3). It is interesting to note that the greatest mean values of the correlation coefficient (r = 0.41) were obtained when using s = 4, in the analysis made through the mixed model. The empirical distributions of correlation coefficient estimates for each case are shown in Figure 1. It is important to highlight that the maximum theoretical limit of this correlation is not 1.0, but the square root of the heritability coefficient (Vencovsky and Barriga, 1992). In the present work, this limit was equal to 0.734, which does not seem so unrealistic when compared with the correlations found by Bernardo (1996a). This author evaluated 4099 hybrids among several heterotic groups, using the mixed model method associated with the co-ancestry data between the parents. In his experiment, the correlations ranged from 0.136 to 0.762, with theoretical maximum limits of 0.554 and 0.864, respectively.

    It is relevant to emphasize that the increase observed in the correlation coefficients, when the size of s is decreased, does not mean that the lower values of s allow better predictions. As stated by Gauch and Zobel (1988), it means that the correlation is measuring the postdictive ability of the model, that is, with the decrease in s, the model can better explain the observed data. Moreover, when the correlation coefficients were evaluated through a cross-validation process, an increase of the correlation coefficients was observed whenever the values of s were increased. In this situation, not only the ability of the model to predict non-realized observations is evaluated, but also its ability to describe the set of observed data. Thus, it is possible to assess the predictive ability of the model by approaching its predictions to the data not included in the analysis, simulating future responses that have not been measured yet.

    It is thus clear that a reduction in the value of s also decreases the predictive potential of the model. Furthermore, the correlation coefficient calculated through the observed hybrids, during both the assessment of i and of j and the model's validation, yields bias. This bias can be calculated assuming that the average correlation coefficient obtained by using the cross-validation is the parametric value for each value of s. In this case, an increase in the bias of the correlation coefficient estimate can be observed when the value of s decreases (Table 4). The bias is of greater magnitude when the OLS method is employed in the analysis, ranging from 0.482 with s = 5 to 0.816 with s = 2. Using the mixed models, the value found was 0.363 with s = 5 and 0.731 with s = 2. Another indicator of this bias can be observed in Figure 1, where the distributions obtained using cross-validation do not exceed the maximum theoretical limit of the correlation coefficient (MC). However, this is not true for the first situation.

    If hybrids with yield mean superior to the check mean were to be selected, considering all possible hybrids in the diallel table (1156 hybrids), and if the prediction was made through OLS analysis with s = 5, 19% of the hybrids would be selected. By using a mixed model, only 1.8% would be selected. However, Spearman's correlation coefficient between the ranks of hybrid means by the two analyses was equal to 0.95.

    The use of the mixed model approach was more efficient than the OLS in the operation and management of this data set, resulting in more accurate estimates of correlation coefficients between observed and non-observed hybrids. Values of s < 4 have yielded poorer predictions for both the mixed model and the OLS analysis. The use of the same data set to estimate the parameters and to evaluate the model does not permit inferences about future responses of the hybrids.

    The circulant crossing method, with s = 5 and s = 4, associated with the methodology of mixed models, allowed to predict non-observed hybrid means, and showed good reliability, which is very important in the initial stages of the evaluation of inbred lines.

    References

    Andrade JAC (1995) Dialelo parcial circulante interpopulacional em milho (Zea mays L.) com dois niveis de endogamia dos parentais. Tese de Doutoramento, Universidade de So Paulo, Piracicaba, SP.

    Andre CMG (1999) Avaliao da melhor predio linear no tendenciosa (BLUP) associada ao uso de marcadores moleculares na analise dialelica. Dissertao de Mestrado, Universidade Federal de Lavras, Lavras.

    Araujo PM (2000) Dialelo parcial circulante interpopulacional e cruzamentos "top-crosses" na avaliao de linhagens parcialmente endogamicas de milho (Zea mays L.). Tese de Doutoramento, Universidade de So Paulo, Piracicaba, SP.

    Bernardo R (1994) Prediction of maize single-cross performance using RLFPs and information from related hybrids. Crop Sci 1:20-25.

    Bernardo R (1995) Genetic model for predicting maize performance in unbalanced yield trial data. Crop Sci 1:141-147.

    Bernardo R (1996a) Best linear unbiased prediction of maize single-cross performance. Crop Sci 1:50-56.

    Bernardo R (1996b) Best linear unbiased prediction of the performance of crosses between untested maize inbreds. Crop Sci 4:872-876.

    Dempster AP, Laird NM and Rubim DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Ser. B 1:1-38.

    Duarte JB and Vencosky R (2001) Estimao e predio por modelo linear misto com ênfase no ordenamento de medias de tratamentos geneticos. Sci Agri 1:109-117.

    Fuzatto SR (2003) Dialelo parcial circulante interpopulacional em milho (Zea mays L.): Efeito do numero (s) de cruzamentos. Tese de Doutoramento, Universidade de So Paulo, Piracicaba, SP.

    Gardner CO and Eberhart SA (1966) Analysis and interpretation of the variety cross diallel and related populations. Biometrics 22:439-452.

    Gauch HG and Zobel RW (1988) Predictive and postdictive success of statistical analysis of field trial. TAG 76:1-10.

    Gonalves PS (1987) Esquema circulante de cruzamentos para avaliao de linhagens de milho (Zea mays L.) ao nivel interpopulacional. Tese de Doutoramento, Universidade de So Paulo, Piracicaba, SP.

    Griffing B (1956) Concept of general and specific combining ability in relation to diallel crossing systems. Australian Journal of Biological Science 9:463-493.

    Henderson CR (1984) Applications of Linear Models in Animal Breeding. University of Guelph, Ontario, 462 pp.

    Henderson CR (1986) Recent developments in variance and covariance estimation. J Anim Sci 63:208-216.

    Kempthorne O and Curnow RN (1961) The partial diallel crosses. Biometrics 17:229-250.

    Littel RC, Mccuthan BG, Stroup WW and Wolfinger RD (1996) SAS System for Mixed Models. Cary, SAS Institute Inc., 633 pp.

    Miranda Filho JB and Vencovsky R (1999) The partial circulant diallel crosses at the interpopulation level. Genetics and Molecular Biology 4:249-255.

    Reis AJS (2000) Estudos de componentes de heterose ao nivel interpopulacional a partir de cruzamentos dialelicos parciais circulantes. Dissertao de Mestrado, Universidade Federal de Goias, Goiania, GO.

    Resende MDV (2002) Procedimentos otimos de seleo com dados balanceados e desbalanceados. In: Resende, MDV (ed) Genetica Biometrica e Estatistica no Melhoramento de Plantas Perenes. Embrapa Informao Tecnologica, Brasilia, pp 209-345.

    Searle SR, Casella G and Mcculloch CE (1992) Variance components. John Wiley & Sons, New York, 501 pp.

    Vencovsky R and Barriga P (1992) Genetica Biometrica no Fitomelhoramento. Sociedade Brasileira de Genetica, Ribeiro Preto, 496 pp.(Americo Jose dos Santos R)