Group_Sequential(基础医学)

Group Sequential Methods and Sample Size Savings in Biomarker-Disease Association Studies

http://www.100md.com 《基因杂志》2003年第3期

     ^a Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6021h^uj, 百拇医药

    ^b Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104h^uj, 百拇医药

    ABSTRACTh^uj, 百拇医药

    Molecular epidemiological association studies use valuable biosamplesand incur costs. Statistical methods for early genotyping terminationmay conserve biosamples and costs. Group sequential methods(GSM) allow early termination of studies on the basis of interimcomparisons. Simulation studies evaluated the application ofGSM using data from a case-control study of GST genotypes andprostate cancer. Group sequential boundaries (GSB) were definedin the EAST-2000 software and were evaluated for study terminationwhen early evidence suggested that the null hypothesis of noassociation between genotype and disease was unlikely to berejected. Early termination of GSTM1 genotyping, which demonstratedno association with prostate cancer, occurred in >90% of thesimulated studies. On average, 36.4% of biosamples were savedfrom unnecessary genotyping. In contrast, for GSTT1, which demonstrateda positive association, inappropriate termination occurred inonly 6.6%. GSM may provide significant cost and sample savingsin molecular epidemiology studies.

    ALTHOUGH group sequential methods (GSM) are routinely used tomonitor randomized clinical trials, they have not yet been widelyapplied to molecular epidemiology (ME) studies. In clinicaltrials, GSM allow early closure of one or more treatment armson the basis of interim analysis (WHITEHEAD 1999 ). By enablingearly closure, GSM protect patients from unnecessary exposureto an unfavorable treatment. The statistical "cost" for earlyclosure, the loss of precision of the effect size estimate,is acceptable because patients are protected from unnecessaryexposure to unfavorable therapies. There is an extensive literatureon group sequential methods and their application to clinicaltrials; an excellent summary is provided in JENNISON and TURNBULL2000 .rc{s, 百拇医药

    Early closure for "futility," in which the study is unlikelyto lead to rejection of the null hypothesis, is becoming morecommonly used in clinical trials. Although ME studies lack thisethical imperative for early closure, such studies would benefitfrom early closure for futility for several reasons. First,such studies often use biologic samples that are difficult toobtain or limited in quantity. Second, genotype assessment incursboth material and labor costs. Thus, early closure for failureto reject the null hypothesis may save samples, reagents, labor,and opportunity costs. Finally, clearly defined interim analysisprocedures would provide investigators with a formal tool forevaluating their data on an ongoing basis.

    Previous investigators have described the importance of earlyclosure for null effects (GOULD 1983 ; JENNISON and TURNBULL2000 ). O'Neill and Anello first described the use of the Waldsequential probability ratio test (SPRT), an open procedure,in a matched case-control study (O'NEILL and ANELLO 1978 ).PASTERNAK and SHORE 1980 demonstrated that in a cohort studythe group sequential design had generally higher efficiencythan that of a fixed sample plan. KAAKS et al. 1994 demonstratedthe application of a sequential t-test to the use of biologicsamples. VAN DER TWEEL and VAN NOORD 2000 described both aSPRT and a triangular test for sample sequential analysis ofgenotype data. Recently, SATAGOPAN et al. 2002 described theuse of a two-stage design for maximizing power when the totalcost is the primary study constraint.2n6, http://www.100md.com

    Current molecular epidemiology studies, however, have practicalcharacteristics that preclude these approaches. First, the finitenumber of available samples and limits on funding time linesprevent the use of an "open" GSM whose sample size is potentiallyunlimited. Second, almost all molecular epidemiology studiesacquire genotype data on a group of samples simultaneously.Thus, the most appropriate GSM must evaluate sequential groupsof genotype data rather than sequential individual genotypes.Finally, current studies often evaluate a small number of genotypes(<10), thus making the sample itself the primary limitingvariable.

    We evaluated the group sequential boundaries methods becauseof their widespread use and the availability of GSM commercialsoftware. In GSM, the number of interim "looks" is frequentlyequally spaced and predefined at the design stage. These criteriamay be relaxed during study conduct. In a case-control study,the test statistic is the {chi}pi$z, http://www.100md.com

    ² value corresponding to the oddsratio of disease between cases and controls. In the case ofearly stopping for futility, if the {chi}pi$z, http://www.100md.com

    ² test statistic is lessthan a predefined value, called a boundary value, then it isunlikely that genotyping additional samples will give a statisticallysignificant result. Therefore genotyping stops once the {chi}pi$z, http://www.100md.com

    ² teststatistic crosses this boundary. Stopping boundaries may bedefined by commercial software packages such as EAST-2000 (Cambridge,MA; ) or PEST (Reading, UK; )or by writing local software (SCHOENFELD 2001 ).

    demonstrates the evolution of a test statistic in a hypotheticalstudy with eight looks. The study would terminate early to acceptthe null hypothesis if the path of the test statistic crossedthe boundary at any point, as occurs at look number 6. For somechoices of parameter values, early closure is not possible.For example, the boundary shown in does not allow closureat the first look, where it is undefined. Therefore, irrespectiveof the results obtained at the first look, a second round ofgenotyping would be required.1}afw{[, http://www.100md.com

    fig.ommitted1}afw{[, http://www.100md.com

    Figure 1. Hypothetical OBF boundary and test statistic. Evolution of two-sided test statistic that crosses the OBF boundary at the sixth look and terminates the experiment.1}afw{[, http://www.100md.com

    Simulation studies were used to evaluate the application ofGSM. Two previously published data sets of GST genotype andprostate cancer risk were used for the simulations (REBBECKet al. 1999 ). These data sets were chosen for several reasons.First, the GSTM1 data set reported a null association and representedthe case where early stopping for futility with a GSM couldprovide significant sample and cost savings. Second, the GSTT1data set reported a positive association and was used to evaluatethe frequency of inappropriate genotyping termination. Sincepublication, additional cases and controls have been genotyped;the sample set used in the simulations contained a total of675 GSTM1 and 725 GSTT1 genotypes. The observed odds ratio (OR)for GSTM1 was OR = 0.99, 95% confidence interval (C.I.) 0.72–1.38;for GSTT1, the OR = 1.61, 95% C.I. 1.12–2.32. In additionto representing both a null and a positive association, bothdata sets have samples sizes and odds ratios typically seenin present-day ME studies. Finally, the raw data were readilyavailable for the simulation studies.

    O'Brien-Fleming (OBF) stopping boundaries for both rejectionand failure of rejection of the null hypothesis at each intervalof genotype data acquisition were defined using EAST-2000 (O'BRIENand FLEMING 1979 ). We chose the OBF boundary because it ismost frequently used to monitor clinical trials and is moreconservative than the alternative Pocock boundary for both rejectionand acceptance of the null hypothesis. A more conservative boundaryalso limits the decrease in power associated with using GSM.We chose EAST-2000 for its ease of use and commercial availabilitysince many groups conducting molecular epidemiology studiesdid not have resources to generate in-house software. AlthoughEAST-2000 requires that boundaries for rejection of the nullhypothesis be part of the overall calculation of GSM boundaries,only boundaries for failure of rejection of null hypothesiswere used in the simulations.|6, 百拇医药

    For all simulations, the overall two-sided type I error wasset at {alpha}

    = 0.05. Since the sample pool was fixed (N = 675 forGSTM1 and 725 for GSTT1), the power was defined by the samplesize, null genotype frequencies in controls, and OR. We chosenot to specify a type II error rate to examine the performanceof the GSM method over a range of genotype frequencies and ORs.Genotype frequencies in controls were set at 10%, at 50%, andat the genotype frequency observed in the data set used. Theobserved genotype frequencies were 38% for GSTM1 and 28% forGSTT1. ORs of 1.6, 1.8, and 2.0 were examined. The OR of 1.6was chosen to correspond to that observed for GSTT1. An OR of2.0 corresponds to that often used as the target "clinicallysignificant association" for many epidemiological studies. TheOR of 1.8 was chosen to be intermediate between these two. Forthese simulations, the interval of genotype data acquisitionwas termed a "look." Each look contained a multiple of 90 genotypesto simulate genotype acquisition from a 96-well PCR-based genotypingmethod (e.g., 90 genotypes and 6 control samples per PCR run).

    In addition to the simulation parameters defined by the baselinefrequencies and OR, three different look strategies were examined.The first strategy had two looks, with the interim look occurringafter ~%-tlt, http://www.100md.com

    50% of the samples had been genotyped. The second strategyused the maximum number of possible looks, given the samplesize and the restriction that each look (except the last) mustinclude a multiple of 90 samples. The third strategy chosenwas intermediate between these. Thus simulations for GSTM1 examinedtwo, three, or seven looks; two, four, or eight looks were examinedfor GSTT1.%-tlt, http://www.100md.com

    A total of 1000 replications were performed for each of the27 combinations of baseline gene frequency, OR, and number oflooks. Simulations were done separately for the GSTM1 and GSTT1data sets. For each replication, prostate cancer cases and controlswere randomly sampled from the true data sets without replacementand in proportion to their relative frequencies. The observedOR and {chi}

    ² test statistic were calculated for each look. The {chi}@^(, 百拇医药

    ²test statistic was then compared to the boundary value calculatedby EAST-2000 for study termination. If the test statistic wasless than the boundary for early stopping, i.e., if the teststatistic "crossed the boundary," then the run terminated. Ifthe test statistic did not cross the boundary, then an additionallook was selected and the test statistic recalculated, accountingfor the information gained in the prior look. This procedurewas repeated until the test statistic crossed a boundary orall genotypes were sampled. All simulation studies and analyseswere performed using STATA v7.0 (College Park, TX).@^(, 百拇医药

    In the above, we dealt with the potential for early closureby using the boundary values themselves (on the {chi}@^(, 百拇医药

    ² test statisticscale). This method allows application of these methods to teststatistics that are not built into standard group sequentialsoftware packages. However, it should be noted that an alternativemeans of conducting monitoring of a molecular epidemiology trialwould be to use directly the methods developed for a comparisonof two binomials. These methods are available in, for example,EAST-2000.

    Results for GSTM1 simulations are shown in . Overall,91.5% of the simulations terminated early with a range of 4.5–100%.The median genotyped sample size was 459. Thus, use of GSM decreasedthe median sample size by 32%. Results for the GSTT1 simulationsare shown in . On average, only 6.6% of the GSTT1 simulationsterminated early. The median sample size was 714 with the samplesize of 725 representing the entire data set. This low frequencyof termination is appropriate as an association between GSTT1genotype and outcome was present in the data set.d.x, 百拇医药

    fig.ommittedd.x, 百拇医药

    Table 1. Results of simulation studies for GSTM1d.x, 百拇医药

    fig.ommittedd.x, 百拇医药

    Table 2. Results of simulation studies for GSTT1d.x, 百拇医药

    Our simulations indicate that GSM may provide significant improvementsfor case-control molecular epidemiology studies. Our approachof evaluating genotype data in multiples of 90 more closelyreflects laboratory data acquisition and is thus directly applicableto large molecular epidemiology studies. For GSTT1 simulationswith 80% power, assuming a genotype cost of $3.00/genotype,the use of GSM would save ~

    $650 from a total cost of $2025, inaddition to savings in technician time and reagents. This samplesize savings had a relatively small cost to the overall powerof the study. The average difference in study power betweena fixed sample design and a GSM design for GSTM1 simulationswith 80% power was 3.3% (average fixed sample size power was86.2%; average GSM design power was 82.9%). For these simulations,the average difference in study power between a one-look anda maximum-look strategy was also small—3.3%.#.5u, 百拇医药

    A number of observations may be made regarding the effects ofvarying model parameters on the probability of early stopping.First, the frequency of early stopping decreased as the studypower increased. Although power is affected by the baselinefrequency, OR, and sample size, the frequency of early stoppingwas "monotonic" in power. Thus, in all cases lower-power studieshad higher rates of termination and terminated at earlier looksthan did models with higher power. This corresponds to the intuitionthat studies with low power should be more likely to close earlybecause the a priori chance of finding a significant associationis very small, even if an association exists. However, appropriatelypowered models closed appropriately early in the GSTM1 simulationsand had low rates of inappropriate closure in the GSTT1 simulations.

    The baseline genotype frequency in controls (p1) directly affectsthe statistical power. GSTM1 models with a baseline frequencyp1 = 0.38 or p1 = 0.50 and GSTT1 models with p1 = 0.28 had thehighest power for a given OR and number of looks. These higher-powermodels closed later and had larger average sample numbers. Likewise,simulations with larger OR closed later and had larger averagesample numbers than simulations with lower OR for the same p1and number of looks.-d, http://www.100md.com

    Finally, increasing the number of looks decreased the studypower and in general decreased the average sample number. Interestingly,for GSTM1 models with a typical power of ~-d, http://www.100md.com

    80%, an intermediatenumber of looks had higher average sample numbers than modelswith either the minimum or the maximum number of looks. Modelswith two looks obtained enough genotype information at the firstlook to close early with a high rate with attendant sample sizesavings. This is consistent with the results of similar analysesin clinical trials (POCOCK 1982 ). Models with seven looks gainedthe majority of genotype information in the middle looks, alsoallowing for substantial sample size savings. However, modelswith three looks had low rates of closure at look 1, thus requiringa second look.

    Since our simulations indicate that an intermediate look strategymay give a higher average sample number for studies with ~&qu, http://www.100md.com

    80%power, investigators may wish to choose either a minimum- ora maximum-look strategy. Since the power cost of additionallooks is relatively small, the optimal number of looks willbe determined largely by the opportunity cost of multiple dataanalyses as well as by the need to conserve samples and costs.If samples are limited or expensive to assay, investigatorsmay wish to perform multiple looks to minimize the average samplenumber. However, if sample conservation or cost minimizationare not overriding concerns, then investigators may wish toperform only one interim analysis.&qu, http://www.100md.com

    ACKNOWLEDGMENTS&qu, http://www.100md.com

    Supported by the Doris Duke Charitable Foundation (R.A.) andthe Leonard and Madilyn Abramson Endowed Chair, National Institutesof Health grant R01-CA85074 (T.R.R.)&qu, http://www.100md.com

    Manuscript received September 27, 2002; Accepted for publication December 9, 2002.

    LITERATURE CITED@}.l-e, http://www.100md.com

    GOULD, A. L., 1983 Abandoning lost causes (early termination of unproductive clinical trials). Proceedings of the Biopharmaceutical Sciences, American Statistical Association, Washington, DC, pp. 31–34.@}.l-e, http://www.100md.com

    JENNISON, C., and B. W. TURNBULL, 2000 Group Sequential Methods With Application to Clinical Trials. Chapman Hall/CRC Press, New York.@}.l-e, http://www.100md.com

    KAAKS, R., I. VAN DER TWEEL, P. A. VAN NOORD, and E. RIBOLI, 1994 Efficient use of biological banks for biochemical epidemiology: exploratory hypothesis testing by means of a sequential t-test. Epidemiology 5:429-438.@}.l-e, http://www.100md.com

    O'BRIEN, P. C. and T. R. FLEMING, 1979 A multiple testing procedure for clinical trials. Biometrics 35:549-556.@}.l-e, http://www.100md.com

    O'NEILL, R. T. and C. ANELLO, 1978 Case-control studies: a sequential approach. Am. J. Epidemiol. 108:415-424.@}.l-e, http://www.100md.com

    PASTERNAK, B. S. and R. E. SHORE, 1980 Group sequential methods for cohort and case-control studies. J. Chronic Dis. 33:365-373.@}.l-e, http://www.100md.com

    POCOCK, S. J., 1982 Interim analyses for randomized clinical trials: the group sequential approach. Biometrics 38:153-162.y\, http://www.100md.com

    REBBECK, T. R., A. H. WALKER, J. M. JAFFE, D. L. WHITE, and A. J. WEIN et al., 1999 Glutathione S-transferase-mu (GSTM1) and -theta (GSTT1) genotypes in the etiology of prostate cancer. Cancer Epidemiol. Biomarkers Prev. 8:283-287.y\, http://www.100md.com

    SATAGOPAN, J. M., D. A. VERBEL, E. S. VENKATRAMAN, K. E. OFFIT, and C. B. BEGG, 2002 Two-stage designs for gene-disease association studies. Biometrics 58:163-170.y\, http://www.100md.com

    SCHOENFELD, D. A., 2001 A simple algorithm for designing group sequential clinical trials. Biometrics 57:972-974.y\, http://www.100md.com

    VAN DER TWEEL, I. and P. A. VAN NOORD, 2000 Sequential analysis of matched dichotomous data from prospective case-control studies. Stat. Med. 19:3449-3464.y\, http://www.100md.com

    WHITEHEAD, J., 1999 A unified theory for sequential clinical trials. Stat. Med. 18:2271-2286.(R. Aplenc H. Zha T. R. Rebbeck and K. J. Propert)

百拇医药网 http://www.100md.com/html/DirDu/2005/05/05/58/53/77.htm