当前位置: 首页 > 期刊 > 《临床肿瘤学》 > 2005年第5期 > 正文
编号:11329495
Molecular Staging for Survival Prediction of Colorectal Cancer Patients
http://www.100md.com 《临床肿瘤学》
     the Departments of Surgery, Pathology, and Biostatistics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL

    The Institute for Genomic Research, Rockville

    Department of Statistics, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD

    Department of Biochemistry, George Washington University, Washington, DC

    The Molecular Diagnostic Laboratory, Department of Clinical Biochemistry, Aarhus University Hospital, Skejby, Denmark

    Department of Medical Genetics, Biomedicum, Helsinki, Finland

    ABSTRACT

    PURPOSE: The Dukes' staging system is the gold standard for predicting colorectal cancer prognosis; however, accurate classification of intermediate-stage cases is problematic. We hypothesized that molecular fingerprints could provide more accurate staging and potentially assist in directing adjuvant therapy.

    METHODS: A 32,000 cDNA microarray was used to evaluate 78 human colon cancer specimens, and these results were correlated with survival. Molecular classifiers were produced to predict outcome.

    RESULTS: Molecular staging, based on 43 core genes, was 90% accurate (93% sensitivity, 84% specificity) in predicting 36-month overall survival in 78 patients. This result was significantly better than Dukes' staging (P = .03878), discriminated patients into significantly different groups by survival time (P < .001, log-rank test), and was significantly different from chance (P < .001, 1,000 permutations). Furthermore, the classifier was able to discriminate a survival difference in an independent test set from Denmark. Molecular staging identifies patient prognosis (as represented by 36-month survival) more accurately than the traditional clinical staging, particularly for intermediate Dukes' stage B and C patients. The classifier was based on a core set of 43 genes, including osteopontin and neuregulin, which have biologic significance for this disease.

    CONCLUSION: These data support further evaluation of molecular staging to discriminate good from poor prognosis patients, with the potential to direct adjuvant therapy.

    INTRODUCTION

    Colorectal cancer staging is currently based solely on simple clinicopathologic features such as bowel wall penetration and lymph node metastasis. Unfortunately, clinical staging systems often fail to discriminate the biologic behavior of a large number of tumors, resulting in the systematic overtreatment or undertreatment of patients with adjuvant therapies. Devised more than 70 years ago,1 the now modified Dukes' staging system provides adequate prognostic information for patients staged as A or D. However, the intermediate stages of B and C are not extremely useful in discriminating good from poor prognosis patients. Additionally, application of this staging system results in the potential overtreatment or undertreatment of a significant number of patients, and it can only be applied after complete surgical resection rather than after a presurgical biopsy. Recently developed microarray technology has permitted the development of multiorgan cancer classifiers,2,3 identification of tumor subclasses,4-6 discovery of progression markers,7,8 and prediction of disease outcome in many types of cancer.9-11 Unlike clinicopathologic staging, molecular staging has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis. Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures that, at the time of diagnosis, can direct the biologic behavior of the tumor over time.12

    METHODS

    The human investigations were performed after approval by the University of South Florida Institutional Review Board and in accord with an assurance filed with and approved by the Department of Health and Human Services. A waiver of informed consent was filed.

    Tumor Samples (Moffitt Cancer Center)

    We developed a colorectal cancer survival classifier using 78 tumor samples including three adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank based on evidence for good (survival > 36 months) or poor (survival < 36 months) prognosis from the Tumor Registry. The samples were initially selected such that all patients had follow-up for at least 36 months. Forty-eight samples were poor prognosis cases, and 30 samples were good prognosis cases. Dukes' stages included a mixture of B, C, and D cases. Dukes' stage A cases are rare and were not available for analysis; however, we selected three adenomas as examples of very good prognosis cases. Survival was measured as last contact minus collection date for living patients or date of death minus collection date for patients who had died. The median follow-up time was 27.9 months (range, 0.49 to 119 months), and the median follow-up time among patients alive at last follow-up (26 patients) was 64 months.

    Samples were microdissected (> 80% tumor cells) by frozen section guidance, and RNA was extracted using Trizol reagent (Invitrogen, Carlsbad, CA) followed by secondary purification on RNAEasy columns. The samples were profiled on the Insitute for Genomic Research's (TIGR) 32,488-element spotted cDNA arrays, containing 31,872 human cDNAs representing 30,849 distinct transcripts (23,936 unique TIGR tentative consensus [TC] and 6,913 expressed sequence tags [EST], 10 exogenous controls printed 36 times, and four negative controls printed 36 to 72 times).

    Significance Analysis of Microarrays Survival Analysis

    The first analysis of the colorectal cancer microarray data used the Significance Analysis of Microarrays program (SAM; Stanford University, Stanford, CA) with censored survival data.13 SAM identifies genes most closely correlated with survival time and uses permutation analysis to estimate the false discovery rate. Mean-centered gene expression vectors were then clustered and visualized using Cluster 3.0 and Java TreeView 1.03.14 We used the two clusters of SAM-selected genes as a survival grouping and constructed Kaplan-Meier curves for these two groups. The samples were also manually grouped by Dukes' stage for a comparison of survival times.

    Classifier Construction and Evaluation

    A leave-one-out cross-validation (LOOCV) technique was used for constructing and validating a neural network-based classifier. The samples were classified as having good or poor prognosis based on survival for more or less than 36 months, respectively. Using the LOOCV approach also provides the ability to rank the selected genes. The number of times a particular gene was chosen can be an indicator of the usefulness of that gene for general classification15 and may imply biologic significance. Therefore, we examined the genes that were consistently selected by the t test. We focused on 43 core genes identified in 75% of the LOOCV iterations.15

    The molecular classifier was composed of the following two distinct steps: gene selection using a t test and classification using a neural network. Both steps were taken after the test sample was left out (from the LOOCV) to avoid bias from the gene selection step. We used the top 50 genes as ranked by absolute value of the t statistic using a t test, for each cross-validation step. A feed-forward back-propagation neural network16 with a single hidden layer of 10 units, a learning rate of 0.05, and a momentum of 0.2 was constructed. Training occurred for a maximum of 500 epochs or until a zero misclassification error was achieved on the training set. Our experiences indicate that neural networks are extremely robust to both the number of genes selected and the level of noise in these genes. We have successfully used this classifier in earlier multiplatform gene expression classification experiments.3

    Statistical Significance

    The log-rank test was used to measure the difference in Kaplan-Meier curves, both for the initial survival analysis and for the classifier results. The classifier splits the examples into two groups, those predicted as good or poor prognosis. Classifier accuracy was reported both as overall accuracy and as specificity/sensitivity. A McNemar's 2 test was used to compare the molecular classifier with the use of a Dukes' staging classifier. Finally, 1,000 permutations of the data set were used to measure the significance of the classifier results as compared with chance.

    RESULTS

    Identification of Prognosis-Related Genes

    We used SAM survival analysis to identify a set of genes most correlated with censored survival time. A set of 53 genes was found, corresponding to a median expected false discovery rate of 28%. These genes are listed in Table 1 and include several genes that we believe to be biologically significant, such as osteopontin and neuregulin (see Discussion). Figure 1A presents a graphical representation of these 53 SAM-selected genes as a clustered heat map. The figure uses only the Dukes' stage B and C patients, whose outcome Dukes' staging predicts poorly (Fig 1D). Because we clustered using only genes correlated with survival, the clusters should correspond to different prognosis groups. The SAM-selected genes are also arranged by annotated Dukes' stage (B and C stages only) in Figure 1B, which shows little discrimination based on gene expression.

    Figure 1C shows the Kaplan-Meier plot for the two dominant gene expression clusters of stage B and C samples. Clearly, these 53 genes separate the patients into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1; P < .001, log-rank test) as expected because the genes were selected using SAM. Figure 1D presents a Kaplan-Meier plot of the survival times of patients with Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference and demonstrating the weak potential for discrimination of these patients by Dukes' staging. Here, we demonstrate that gene expression profiles can separate good and poor prognosis patients better than Dukes' staging. This suggests that a gene expression–based classifier could be more accurate at predicting patient prognosis than the traditional Dukes' staging.

    Performance of a 43-Gene cDNA-Based Colorectal Cancer Survival Classifier

    LOOCV was used to evaluate a 43-gene cDNA classifier in predicting prognosis for each patient at 36 months of follow-up. The classifier accuracy was 90% (93% sensitivity and 84% specificity). A log-rank test of the two predicted groups (good and poor prognosis) was significant (P < .001), demonstrating the ability of the classifier to distinguish the two outcomes (Fig 1E). Good prognosis was defined by survival of more than 36 months, and poor prognosis was defined by survival of less than 36 months. Permutation analysis demonstrates that the result is better than possible by chance (P < .001, 1,000 permutations). This result is also significantly higher than that observed using Dukes' staging as a classifier for the same group of patients (P = .03878). The results for molecular staging are listed in Tables 2, 3, and 4. Molecular staging identified the good prognosis patients (the default classification using Dukes' staging) and also identified poor prognosis patients with a high degree of accuracy. Table 4 also lists the detailed confusion matrix for all samples in the data set, showing the equivalent misclassification rate of both good and poor prognosis groups by our classifier. Table 5 lists all 43 genes selected by the LOOCV approach via t test and used to construct the cDNA classifier.

    Evaluation of an Independent Test Set From Denmark

    To further validate our classifier, we identified an independent Danish colon cancer data set comprised of Dukes' stage B and C patients and produced on a U133A Affymetrix GeneChip (Affymetrix, Santa Clara, CA) oligonucleotide-based platform. Because this data set was produced on an oligonucleotide platform instead of our cDNA platform, we first translated our gene signature into available Affymetrix probe sets using the Resourcerer program (TIGR, Rockville, MD; www.TIGR.org). This translation reduced the gene signature from 43 genes to only 26 unique genes. Therefore, we limited the original cDNA classifier to only genes represented on the U133A platform. Using this restricted gene signature derived solely from the Moffitt data set, we found 60 corresponding probe sets on the Affymetrix U133A GeneChip and used these genes to evaluate the survivorship of 95 Danish patients. With this approach, hierarchical clustering was used to find the most significant groups in the data. The survival of these groups was then displayed using Kaplan-Meier survival analysis. Figure 2A shows that the 26 selected genes were able to discriminate good from poor prognosis patients, despite the restrictions imposed by cross-platform analyses. Dukes' staging was incapable of discriminating survivorship in these same patients (Fig 2B). When applied to the Dukes' stage B and C patients separately, the 26 gene signature was capable of discriminating good from poor prognosis subpopulations within each stage (Fig 2C).

    DISCUSSION

    The benefit of adjuvant chemotherapy for colorectal cancer seems limited to patients with Dukes' stage C disease, where the cancer has metastasized to lymph nodes at the time of diagnosis. For this reason, the clinicopathologic Dukes' staging system is critical for determining how adjuvant therapy is administered. Unfortunately, it is not very accurate in predicting overall survival, and thus, its application likely results in the treatment of a large number of patients to benefit an unknown few. Alternatively, there are probably a number of patients who might benefit from therapy who do not receive it.

    Molecular staging may provide more accurate predictions of patient outcome than is currently possible with clinical staging, which may, in fact, misclassify patients. Using a SAM-selected set of genes derived from a genome-wide analysis of gene expression, we were first able to cluster groups of patients with good and bad prognoses, suggesting that outcome-rich information was likely present in this gene expression data set. Subsequently, a supervised learning analysis identified a core set of 43 informative genes that appeared in 75% of the cross-validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microarray that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients.

    Although Dukes' staging works well for very good and very poor prognosis patients (Dukes' stage A and D), currently it is not very informative when predicting long-term outcomes of intermediate prognosis patients (Dukes' stage B and C), yet it is the primary means for determining the administration of potentially toxic adjuvant chemotherapy. We hypothesized that molecular staging might be able to identify those Dukes' stage B and C patients for whom chemotherapy might be beneficial. The production of a cDNA classifier for survival is a first step in the validation process for molecular staging.

    With our approach, we were able to determine which genes seemed to be most useful in the classifier based on their frequency of appearance in the classification set. Of these genes, at least two, osteopontin and neuregulin, have reported biologic significance in the context of colorectal cancer. Osteopontin, a secreted glycoprotein17 and ligand for CD44 and v3, seems to have a number of biologic functions associated with cellular adhesion, invasion, angiogenesis, and apoptosis.18 Using an oligonucleotide microarray platform, we recently identified that osteopontin expression is strongly associated with colorectal cancer stage progression.7 Moreover, we have recently identified INSIG-2, which is one of the 43 core classifier genes, as an osteopontin signature gene (data not shown), suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival. Similarly, neuregulin, a ligand for ERBB receptors, may have biologic significance in the context of colorectal cancer; current data suggest a strong relationship between colon cancer growth and the ERBB family of receptors.19 Neuregulin was recently identified as a prognostic gene whose expression correlated with bladder cancer recurrence.15

    Although cross-platform analysis is challenging because of the paucity of available human data sets, the discrepancies in probes used on competitive platforms, and the differential performance characteristics of these probes, we demonstrated that the genes selected for our cDNA-based classifier were effective in discriminating good from poor prognosis patients using a completely independent data set produced on a Danish population using an oligonucleotide platform (Affymetrix, U133A). These data provide confirmation, under the most rigorous conditions, that there is prognostic value for the identified gene signature.

    Of interest is a recent report of a gene expression profile, using the U133A Affymetrix platform, to predict recurrence for Dukes' stage B patients.20 The reported 23-gene signature with a performance accuracy of 78% shares no genes in common with the cDNA classifier we have produced. The absence of concordant genes could be related to many different issues, including differences in the microarray platform, the samples selected for analysis, and the analytic tools used to generate the gene signatures, suggesting the need for extensive validation of any promising signature before clinical implementation of any gene expression signature.

    We have produced the first colorectal cancer molecular staging classifier, based on the analysis of all colon cancer stages, for which accuracy exceeds that of Dukes' staging when used to estimate prognosis on the same patients. These results suggest that a molecular-based method for the discrimination of outcome for intermediate Dukes' stages B and C, where prognosis is currently problematic, may be effective. Our classifier is based on a core set of 43 genes that seem to have biologic significance for human colorectal cancer progression. The data provided support more extensive validation of this prognostic gene signature.

    Authors' Disclosures of Potential Conflicts of Interest

    The authors indicated no potential conflicts of interest.

    NOTES

    Supported by grants from the National Cancer Institute, Bethesda, MD; also supported by the Biostatistics Core and the Microarray Core at H. Lee Moffitt Cancer Center and services provided by the Research Computing Core, University of South Florida.

    Authors' disclosures of potential conflicts of interest are found at the end of this article.

    REFERENCES

    Dukes C: The classification of cancer in the rectum. J Pathol Bacteriol 35 : 323 , 1932

    Ramaswamy S, Tamayo P, Rifkin R, et al: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 98 : 15149 -15154, 2001

    Bloom G, Yang IV, Boulware D, et al: Multi-platform, multi-site microarray based human tumor classification. Am J Pathol 164 : 9 -16, 2004

    Bhattacharjee A, Richards WG, Staunton J, et al: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A 98 : 13790 -13795, 2001

    Khan J, Wei JS, Ringner M, et al: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7 : 673 -679, 2001

    Sorlie T, Tibshirani R, Parker J, et al: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100 : 8418 -8423, 2003

    Agrawal D, Chen T, Irby R, et al: Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling. J Natl Cancer Inst 94 : 513 -521, 2002

    Sanchez-Carbayo M, Socci ND, Lozano JJ, et al: Gene discovery in bladder cancer progression using cDNA microarrays. Am J Pathol 163 : 505 -516, 2003

    Shipp MA, Ross KN, Tamayo P, et al: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8 : 68 -74, 2002

    van 't Veer LJ, Dai H, van de Vijver MJ, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 : 530 -536, 2002

    van de Vijver MJ, He YD, van't Veer LJ, et al: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347 : 1999 -2009, 2002

    Ramaswamy S, Ross KN, Lander ES, et al: A molecular signature of metastasis in primary solid tumors. Nat Genet 33 : 49 -54, 2003

    Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98 : 5116 -5121, 2001

    de Hoon MJ, Imoto S, Nolan J, et al: Open source clustering software. Bioinformatics 20 : 1453 -1454, 2004

    Dyrskjot L, Thykjaer T, Kruhoffer M, et al: Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet 33 : 90 -96, 2003

    Fahlman SE: Faster-Learning Variations on Back-Propagation: An Empirical Study, Proceedings of the 1988 Connectionist Models Summer School. Los Altos, CA, Morgan-Kaufmann, 1988

    Fedarko NS, Jain A, Karadag A, et al: Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer. Clin Cancer Res 7 : 4060 -4066, 2001

    Yeatman TJ, Chambers AF: Osteopontin and colon cancer progression. Clin Exp Metastasis 20 : 85 -90, 2003

    Carraway KL III, Weber JL, Unger MJ, et al: Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases. Nature 387 : 512 -516, 1997

    Wang Y, Jatkoe T, Zhang Y, et al: Gene expression profiles and molecular markers to predict recurrence of Dukes' B colon cancer. J Clin Oncol 22 : 1564 -1571, 2004(Steven Eschrich, Ivana Ya)