当前位置: 首页 > 期刊 > 《临床肿瘤学》 > 2005年第8期 > 正文
编号:11332421
Variability and Sample Size Requirements of Quality-of-Life Measures: A Randomized Study of Three Major Questionnaires
http://www.100md.com 《临床肿瘤学》
     the Division of Clinical Trials and Epidemiological Sciences and Departments of Palliative Medicine and Medical Oncology, National Cancer Centre

    Department of Rheumatology and Immunology, Singapore General Hospital, Singapore

    ABSTRACT

    PURPOSE: To compare the variability and sample size requirements of the global quality-of-life (QOL) scores of the following three major QOL instruments: the Functional Assessment of Cancer Therapy–General (FACT-G), Functional Living Index–Cancer (FLIC), and European Organisation for Research and Treatment of Cancer Core Quality of Life Questionnaire C30 (EORTC QLQ-C30).

    PATIENTS AND METHODS: Cancer patients were randomly assigned to answer two of the three instruments using an incomplete block design (n = 1,268). The instruments were compared in terms of coefficient of variation, effect size in detecting a difference between patients with different performance status, and correlation coefficient between scores at baseline and follow-up.

    RESULTS: The FACT-G and FLIC had significantly smaller coefficients of variation than the EORTC QLQ-C30 (both P < .05). The FLIC also had significantly larger correlation coefficients between scores at baseline and follow-up than the EORTC QLQ-C30 (P < .05). The FACT-G and the FLIC had a larger effect size in a cross-sectional and longitudinal setting, respectively, than the EORTC QLQ-C30 in differentiating patients with different performance status (both P < .05).

    CONCLUSION: In some aspects, the FACT-G and FLIC global QOL scores had smaller variability and larger discriminative ability than the EORTC QLQ-C30. Further research using other criteria to compare the three instruments is recommended.

    INTRODUCTION

    A major challenge in conducting clinical trials is to recruit enough subjects for the trials to have a sufficient power. Sample size depends on the variability of the primary outcome measure. Other factors being equal, the larger the variability is, the larger the sample size required for a given purpose. In the context of questionnaire-based instruments, the level of variability may be affected by the clarity of questions and the precision of response scales, and so on. In the past, health-related quality of life (HRQOL) was usually regarded as a minor end point. However, other researchers have advocated the use of HRQOL as a primary end point in some cancer clinical trials.1,2 Therefore, it is useful to examine the variability of HRQOL scores of different measures and compare their sample size requirements. This will facilitate researchers to make informed choices on HRQOL instruments in clinical trials and to ensure studies are performed as efficiently as possible.

    The Functional Assessment of Cancer Therapy–General (FACT-G),3 the Functional Living Index—Cancer (FLIC),4 and the European Organisation for Research and Treatment of Cancer Core Quality-of-Life Questionnaire C30 (EORTC QLQ-30)5 are major HRQOL questionnaires designed for the assessment of cancer patients. The FACT-G and EORTC QLQ-C30 measure HRQOL globally as well as in specific domains such as emotional well-being. Although the FLIC consists of multiple items, it was designed to measure global level of HRQOL only. In the literature, the terms global, total, and overall quality-of-life (QOL) scores are sometimes used synonymously, depending on which questionnaire is concerned. For brevity, we will use the term global score to refer to the FACT-G total score, the FLIC overall score, and the EORTC QLQ-C30 global functioning score.

    Information on variability is only occasionally reported in the cancer research literature. There have not been previous comparative studies directly comparing the variability of these three instruments administered to the same patients. There are various approaches to sample size estimation when the outcome measure is a continuous variable, and they require different inputs. In the context of survey research, it is common to use the coefficient of variation (CV; ie, standard deviation [SD] divided by mean) as the input.6 From the available literature, we found that the EORTC QLQ-C30 global score showed a CV of approximately 0.30,7 whereas the CVs of the FACT-G and FLIC were approximately 0.20.3,8,9 This suggests that, other factors being the same, the use of the EORTC QLQ-C30 may require a sample size more than double that of the other two questionnaires because the ratio of sample size requirement is proportional to the square of the ratio of CV.6

    In the context of randomized clinical trials and other comparative studies, it is common to use the effect size (defined as the difference in means between two groups divided by the pooled estimate of SD) for sample size estimation.10 This can be seen as a signal-to-noise ratio. A large variability represents a large amount of noise and leads to a small effect size through a large denominator. Reviewing a sizeable number of studies, King11 estimated that the effect size of the EORTC QLQ-C30 global scale in differentiating various clinical statuses ranged from 0.07 to 0.73. In relation to Eastern Cooperative Oncology Group (ECOG) performance scores of 0 to 1 v 2 to 4, the effect size was approximately 0.5. In contrast, the effect size of the FACT-G global score, as calculated from Cella,12 was approximately 1 (ie, twice as large as the effect size of the EORTC QLQ-C30). Because the sample size required to detect a difference between two groups is proportional to the square of effect size,13 the relative effect size (1/0.5 = 2) between the two global scores suggests a quadrupling of sample size if one intends to use the EORTC QLQ-C30 as opposed to the FACT-G. We have not been able to locate similar information for the FLIC.

    Test-retest reliability is a major measurement property researchers usually want to assess. The intraclass correlation depends on the within-subject variability. Although the intraclass correlation is conceptually superior to the Pearson’s correlation (r) for the purpose of assessing test-retest reliability,10 the Pearson’s correlation has often been used in the cancer literature. For instance, the Pearson’s correlation coefficient of the FACT-G global scores measured at an interval of 3 to 7 days was 0.92,3 and the Pearson’s correlation coefficient of the EORTC QLQ-C30 global scores measured at an interval of 4 days was 0.85.14 The FACT-G seemed to be slightly superior. From the viewpoint of sample size calculation, the Pearson’s correlation is a useful piece of information because the variance of a change score is equal to the following: the sum of the variances of the pre- and post-test scores – 2 x r x SDpre x SDpost, where SDpre and SDpost are the SDs of the scores at the pre- and post-test, respectively. The larger the correlation coefficient is, the smaller the variance of the change scores. Furthermore, in the context of repeated measurements of HRQOL, which is common in cancer clinical trials, the correlation coefficient is an explicit input in the sample size calculation.1

    The CVs, effect sizes, and correlation coefficients quoted earlier may or may not be comparable because the studies differed in populations, case mix, and designs. Hence, we conducted a randomized study to compare the three instruments. The primary aim was to compare the variability of global QOL scores measured by the three instruments, which translate to the sample size requirement for using these questionnaires. Variability and sample size requirement are not the only considerations in selecting an outcome measure. For instance, different questionnaires may cover different domains of HRQOL. Researchers have to consider whether the instruments measure what they want to evaluate in a particular study. Nevertheless, variability, discriminative ability, and sample size requirement are important considerations, especially because the patchy information in the literature suggests a tremendous difference in sample size requirements. It is hoped that the findings here will provide supplementary information to assist researchers in making informed choices of HRQOL instruments for their studies.

    PATIENTS AND METHODS

    Design

    An incomplete block design was used,15 in which participants were randomly assigned to receive one of the following three questionnaire packages: (1) FACT-G and EORTC QLQ-C30, (2) FLIC and EORTC QLQ-C30, or (3) FLIC and FACT-G. We chose not to use a complete block design, by having each patient answer all three questionnaires, because our experience has suggested that some patients might not be able or willing to spend so much time and effort on the questionnaires. In the present study, the median time to complete the questionnaire packages was only 18 minutes, but the 90th percentile was 39 minutes. Because of logistical considerations, days rather than individuals were used as units of randomization. Each questionnaire package had two versions that altered the order of presentation of the two HRQOL questionnaires it contained to prevent an order effect.16 This resulted in six subpackages, which were randomized in blocks of 6 days. For the purpose of this article, we will not further mention the study’s feature about order effects and will treat the study as involving three packages.

    Four weeks after the baseline interview, the same questionnaire was sent to each participant by post, together with a prepaid return envelope. Up to three mailings were sent if the participant did not reply to the follow-up questionnaire.

    Patient Recruitment

    September 2003 to May 2004, patients were recruited from the National Cancer Centre Singapore, which serves approximately 70% of the cancer patients seen by the public institutions of the country. The study was approved by the Ethics Committee of the National Cancer Centre. Patients were approached while they were in the waiting areas of the specialist outpatient clinics, ambulatory treatment unit, and the therapeutic radiology department of the National Cancer Centre. The inclusion criteria included being literate in English or Chinese and aged 18 years or older. Furthermore, all participants had to provide written informed consent. The patient group was heterogeneous and covered various clinical profiles (eg, having different types of cancers). This is suitable for the study of HRQOL instruments designed for application to all cancer patients. Singapore is a multiethnic society, with Chinese making up approximately 77% of its population. Chinese participants could choose to answer an English or a Chinese questionnaire according to their preference, whereas participants of other ethnic groups answered an English questionnaire. Participants were requested to self-administer the questionnaires. On request by the patients, interviews would be administered by one of the two research coordinators of the project.

    Instruments

    The FACT-G version 4 and the EORTC QLQ-C30 version 3 were used. The FLIC had been modified in two aspects for use in Singapore.8,17 First, the word cancer was removed from the questions because some patients, especially in the older age group, might not know their diagnosis and their families sometimes had objections to their being told the diagnosis. The FACT-G and EORTC QLQ-C30 do not mention the word cancer. Second, the visual analog scale was replaced by a 7-point Likert format scale because the visual analog scale was difficult for some patients to understand, especially the older and less educated patients. Similar modifications of the FLIC have also been reported in other countries.18,19 The questionnaire packages began with a page of questions on demographic and health particulars, such as ECOG performance status20 and whether the patients are on chemotherapy and/or radiotherapy. Because of the small number of patients with an ECOG performance status score of 4, scores of 3 and 4 were combined as one category. Treatment status was classified as whether or not the patient was on chemotherapy and/or radiotherapy (yes or no).

    Statistical Considerations

    Missing values in the FACT-G, FLIC, and EORTC QLQ-C30 were imputed by the half rule.1,12 That is, the mean of the nonmissing items in the same scale was used to replace the missing values if at least half of the items in the scale were answered. Analysis of variance and 2 tests were used to compare continuous and categoric variables, respectively, between the three questionnaire packages. The CV, effect size in detecting a difference in ECOG performance status of 0 to 1 v 2 to 4 at baseline, effect size in detecting a difference in changes in ECOG performance status during follow-up, and Pearson’s correlation between the pre- and post-test scores were compared. The interpretation of CV is difficult because part of the variation is a result of differences in clinical characteristics. Therefore, we performed multiple regression analysis of the HRQOL scores in relation to performance status, treatment status, and tumor type (all as categoric variables) and used the SDs of the residuals for the numerators of the CVs. There is no established analytic procedure for making inference about these quantities in an incomplete block design. Hence, we used the bootstrapping method, with patients as the sampling units.21 One thousand replications were used; CIs were obtained from the percentiles of the bootstrap distribution.

    Assuming that a ratio of CV of 0.85 or smaller indicates an important reduction in variability, a total sample size of 900 patients (300 for each questionnaire package) was required to achieve a power of 0.80 and an = .05. Considering a 30% rate of loss to follow-up in the longitudinal analysis, a total sample size of approximately 1,300 patients was required. A ratio of 0.85 of CV implies a sample size ratio of 0.72 or a 28% difference in sample size requirement.

    The primary analysis of the three instruments and the sample size calculation were based on all available data. The incomplete block design allowed pairwise comparison of two instruments among respondents who filled out the same questionnaire packages. This was taken as a secondary analysis.

    RESULTS

    The numbers of patients who received packages 1, 2, and 3 were 452, 434, and 431, respectively. Twelve interviews were not usable because of missing values in ECOG performance score or missing values in HRQOL scores beyond imputation by the half rule. Furthermore, 37 interviews were completed by proxies. These patients were excluded in the analysis. Thus, the number of patients included was 1,268. The final numbers of patients completing packages 1, 2, and 3 were 437 (34.5%), 422 (33.3%), and 409 (32.3%). There was no significant difference among the three packages in the number of patients excluded (P = .164).

    Table 1 lists the characteristics of the patients and the interviews by questionnaire package. The three groups were similar in demographic and clinical profiles (each P > .05). Their distribution of mode of administration and time to complete questionnaire were also similar (each P > .10).

    Table 2 lists the CVs and ratios of CVs. The CVs of the FACT-G and FLIC were similar (0.158 and 0.149, respectively). The CV of the EORTC QLQ-C30 was 0.292 and was significantly larger than the CVs of the other HRQOL questionnaires. The ratios of the CV of the FACT-G and FLIC to EORTC QLQ-C30 were 0.541 (95% CI, 0.503 to 0.578) and 0.512 (95% CI, 0.480 to 0.548), respectively. The ratio of the CV of the FACT-G to FLIC was 1.057 (95% CI, 0.982 to 1.136), including the null value. Pairwise comparison within each of the three subsamples gave similar results.

    Table 3 lists the effect sizes for detecting a difference in ECOG performance status at baseline. The effect sizes of the FACT-G, FLIC, and EORTC QLQ-C30 were 1.008, 0.927, and 0.805, respectively. The effect size of the FACT-G was significantly larger than that of EORTC QLQ-C30, with the ratio being 1.252 (95% CI, 1.012 to 1.562). The FLIC was not significantly different from the EORTC QLQ-C30 or FACT-G, with both 95% CIs of the effect size ratios including the null value of 1. Again, pairwise comparison within each of the three subsamples gave similar results.

    Of 1,268 patients interviewed at baseline, 896 (71%) replied to the follow-up survey. The number of participants reporting a better, same, or worse ECOG performance status was 204 (23%), 479 (53%), and 213 (24%), respectively. Table 4 lists the results using changes in ECOG performance status to assess the effect size of HRQOL change scores. Only patients who reported a better or worse ECOG score were included here. The effect sizes of the FACT-G, FLIC, and EORTC QLQ-C30 were 0.554, 0.862, and 0.536, respectively. The effect size of the FLIC was significantly larger than that of the EORTC QLQ-C30, with the ratio being 1.608 (95% CI, 1.040 to 2.763). The FACT-G was not significantly different from the other two instruments. Pairwise comparison within each of the three subsamples again gave similar conclusions.

    Table 5 lists the Pearson’s correlation coefficients of the HRQOL scores in the baseline and follow-up assessments, as well as the differences of the coefficients between the instruments. This analysis was limited to patients who had no change in ECOG performance status and treatment status during the study period (n = 377). The FLIC had the highest correlation (r = 0.781), followed by the FACT-G (r = 0.732). The correlation of the EORTC QLQ-C30 was lowest (r = 0.636) and was significantly lower than that of the FLIC by 0.145 (95% CI, 0.055 to 0.245). The 95% CI of the difference between the FACT-G and EORTC QLQ-C30 slightly overlapped the null value (95% CI, –0.017 to 0.204) and, therefore, was not significant at the 5% level. Pairwise comparison within each subsample showed similar conclusions.

    DISCUSSION

    HRQOL is an important issue in cancer care and research. Questionnaires for the measurement of HRQOL have to undergo various assessments, especially of their validity and reliability, before they can be accepted for use in research and clinical care. The variability of QOL scores and its sample size implication seems to be an insufficiently studied area,22 although it is sometimes examined under the concept of relative efficiency.10

    The FACT-G, FLIC, and EORTC QLQ-C30 are the major HRQOL questionnaires in oncology. However, the questionnaires do not measure exactly the same aspects of HRQOL,8,23,24 with the most noted difference being the operationalization of the social and family life domain. The FACT-G asks how patients feel about their friends and family, the FLIC asks about the impact of the patients’ disease on family members, and the EORTC QLQ-C30 asks about disruptions in relationships with friends and relatives. Researchers have to consider carefully what they want to study.25 Nevertheless, the three questionnaires do overlap considerably.23,24 In particular, their global scores are strongly correlated, with Pearson’s correlation coefficients in the neighborhood of 0.80.3,26 Therefore, their global scores were sometimes used to validate each other. In the planning of clinical trials, it is important to specify a single or a few end points. Unless the research question is about a specific aspect of HRQOL, a global score is the recommended end point.27,28 Variability and sample size are not the primary considerations in the choice of instruments, but they are important considerations, especially when the research has no theoretical reason to focus on a subdomain of HRQOL.

    The key finding of this study is that the global score of the EORTC QLQ-C30 performed less favorably than the FACT-G and FLIC in several aspects. The FACT-G and FLIC were similar in terms of CV, whereas the CV of the EORTC QLQ-C30 was significantly larger. In a cross-sectional setting, the effect size of EORTC QLQ-C30 in detecting a difference between patients with better (0 to 1) and worse (2 to 4) ECOG performance status was 25% and 15% lower than that of the FACT-G and FLIC, respectively, although only the former was statistically significant. In a longitudinal setting, the effect size of the FLIC was 61% larger than the effect size of the EORTC QLQ-C30. Last but not least, the Pearson’s correlation coefficient between test and retest scores of FLIC was stronger than the correlation coefficient of the EORTC QLQ-C30, and the correlation coefficient of the FACT-G (r = 0.096) also seemed to be stronger than the correlation coefficient of the EORTC QLQ-C30, but the 95% CI (–0.017 to 0.204) slightly overlapped the null value. If we take the effect sizes listed in Table 3 as an example, the sample sizes required by using the FACT-G, FLIC, and EORTC QLQ-C30 to detect a difference between the two groups with a power of 80% and an of 5% would be 17, 20, and 26 patients per arm, respectively. In terms of percentage, the sample size required by the EORTC QLQ-C30 would be 57% and 33% larger than the sample sizes required by the FACT-G and FLIC, respectively.13 If we are expecting to detect a big difference between groups, the differences in absolute number of patients required by the three instruments will be small. However, oncology studies often consider modest differences between groups. In such situations, the sample size requirement is usually large, and a 57% or 33% sample size difference is important. As mentioned in the Introduction, information scattered in the literature seems to suggest that the EORTC QLQ-C30 global functioning scale has a large variability. In her review of various studies using the EORTC QLQ-C30, King spotted the small effect size of this scale.11 However, she emphasized that a conclusion could not be made because the finding was subject to the choice of studies in the review. Here we have used a randomized design to confirm the larger variability and sample size requirement of the EORTC QLQ-C30 using several measures.

    The FLIC showed a significantly larger effect size than the EORTC QLQ-C30 in differentiating patients with a change in ECOG performance status, but it was quite comparable to the EORTC QLQ-C30 in terms of differentiating patients with different performance statuses cross sectionally. Note that, in the cross-sectional analysis, ECOG scores were dichotomized as 0 to 1 v 2 to 4. In the longitudinal analysis, a 1-point change (eg, from 0 to 1) was also classified as a change. It may be that the FLIC was more responsive to a small change in ECOG score than the EORTC QLQ-C30. The dichotomy used in the cross-sectional analysis may have obscured some fine differences.

    Although a conscious choice was made to use days instead of individuals as the randomization unit, there is no reason why it should cause bias in the present context. The three study arms were comparable in the major demographic and clinical characteristics, giving support to their comparability. Furthermore, the incomplete block design allowed pairwise comparisons that were based on the same subsamples. The pairwise comparisons largely agreed with the overall analysis.

    We examined effect size in relation to ECOG performance status. This criterion was chosen because it is a powerful predictor of HRQOL, and it is an important concern in cancer care.1 However, we should be aware of the possibility that the larger effect sizes of the FACT-G and FLIC might merely indicate that they are more closely related to performance status than the EORTC QLQ-C30. The present study was limited in the amount of variables collected. Further research using other criteria to compare the discriminative ability of the instruments will be useful.

    Sample size requirement for the assessment of variability tends to be quite large. In a small country like Singapore, it is difficult to perform a study of variability based on recruitment of clinically homogeneous people only. In the comparison of CVs, we studied a heterogeneous sample of patients and used regression analysis to remove the variations caused by differences in clinical characteristics. We note that the CVs shown in Table 2 are slightly smaller than those reported in different studies of homogeneous groups of cancer patients.7,9,29 This provides some reassurance that the removal of unwanted variations by multiple regression analysis has been successful. Nevertheless, regression analysis using measured variables cannot guarantee the total removal of unwanted variations, and further studies using large cohorts of homogeneous patients are needed to confirm the present findings.

    We have the following speculations about why the questionnaires differ in variability. First, the two EORTC QLQ-C30 questions on global HRQOL may not have a clear interpretation. Even QOL research experts have no consensus on what the EORTC QLQ-C30’s global questions really measure.27,30 Patients may understand the questions differently, and therefore, the responses may be more variable. Bernheim31 maintained that, although global assessment of QOL is superior to aggregating scores on various items, it is difficult to get patients to answer global questions seriously. More efforts are needed to develop better ways of asking global questions and getting valid responses. The Anamnestic Comparative Self Assessment, in which the patients’ present experience is anchored in relation to the best and worst times in their life, is one example of such effort.31

    Second, combining components of HRQOL scores reduces measurement errors and increases precision.32 Although most HRQOL measures, including the FACT-G (27 items) and FLIC (22 items), use all items to form a global score, the EORTC QLQ-C30 uses only two questions to give a global score. The details collected by the rest of the questionnaire are not used. Nordin et al28 found that, although the EORTC QLQ-C30 global score did not discriminate between gastrointestinal cancer patients on chemotherapy and best supportive care, a simple average of all the EORTC QLQ-C30 scores revealed significant difference between the patients. They maintained that, although this is not conceptually correct, forming an alternative summary score by a simple average of all HRQOL item scores was beneficial and has the advantage of simplicity.28 The EORTC QOL research team was aware of the need to form an aggregate score using the details collected and said it would investigate this issue.5 It would be good to see if their work would show results similar to that of Nordin et al.28 In conclusion, the findings suggest that, in certain aspects, the FACT-G and FLIC may provide a sample size advantage over using the EORTC QLQ-C30. Further studies to confirm the present findings are warranted.

    Authors' Disclosures of Potential Conflicts of Interest

    Although all authors have completed the disclosure declaration, the following authors or their immediate family members have indicated a financial interest. No conflict exists for drugs or devices used in a study if they are not being evaluated as part of the investigation. For a detailed description of the disclosure categories, or for more information about ASCO’s conflict of interest policy, please refer to the Author Disclosure Declaration and the Disclosures of Potential Conflicts of Interest section in Information for Contributors.

    NOTES

    Supported by research grant No. NMRC/0743/2003 from the National Medical Research Council of Singapore.

    Authors' disclosures of potential conflicts of interest are found at the end of this article.

    REFERENCES

    Fairclough DL: Design and Analysis of Quality of Life Studies in Clinical Trial. Boca Raton, FL, Chapman & Hall, 2002

    Roila F, Cortesi E: Quality of life as a primary end point in oncology. Ann Oncol 12: S3-S6, 2001 (suppl)

    Cella DF, Tulsky DS, Gray G, et al: The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. J Clin Oncol 11: 570-579, 1993

    Schipper H, Clinch J, McMurray A, et al: Measuring the quality of life of cancer patients: The Functional Living Index–Cancer—Development and validation. J Clin Oncol 2: 472-483, 1984

    Aaronson NK, Ahmedzai S, Bergman B, et al: The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. J Natl Cancer Inst 85: 365-376, 1993

    Lohr SL: Sampling: Design and Analysis. Pacific Grove, CA, Duxbury, 1999

    Fayers PM, Weeden S, Curran D: EORTC QLQ-C30 Reference Values. Brussels, Belgium, European Organisation for Research and Treatment of Cancer, 1998

    Cheung YB, Ng GY, Wong LC, et al: Measuring quality of life in Chinese cancer patients: A new version of the Functional Living Index–Cancer (Chinese). Ann Acad Med Singapore 32: 376-380, 2003

    Finkelstein DM, Cassileth B, Bonomi PD, et al: A pilot study of the Functional Living Index-Cancer (FLIC) Scale for the assessment of quality of life for metastatic lung cancer patients: An Eastern Cooperative Oncology Group study. Am J Clin Oncol 11: 630-633, 1988

    Fayers PM, Machin D: Quality of Life: Assessment, Analysis and Interpretation. Chichester, United Kingdom, Wiley, 2000

    King MT: The interpretation of scores from the EORTC quality of life questionnaire QLQ-C30. Qual Life Res 5: 555-567, 1996

    Cella D: FACIT Manual: Manual of the Functional Assessment of Chronic Illness Therapy (FACIT) Measurement System. Evanston, IL, CORE, 1997

    Machin D, Campbell M, Fayers P, et al: Sample Size Tables for Clinical Studies (ed 2). Oxford, United Kingdom, Blackwell, 1997

    Hjermstad MJ, Fossa SD, Bjordal K, et al: Test/retest study of the European Organization for Research and Treatment of Cancer Core Quality-of-Life Questionnaire. J Clin Oncol 13: 1249-1254, 1995

    Senn S: Cross-Over Trials in Clinical Research. Chichester, United Kingdom, Wiley, 1993

    Cheung YB, Wong LC, Tay MH, et al: Order effects in the assessment of quality of life of cancer patients. Qual Life Res 13: 1217-1223, 2004

    Goh CR, Lee KS, Tan TC, et al: Measuring quality of life in different cultures: Translation of the Functional Living Index for Cancer (FLIC) into Chinese and Malay in Singapore. Ann Acad Med Singapore 25: 323-334, 1996

    Conner-Spady B, Cumming C, Nabholtz JM, et al: Responsiveness of the EuroQol in breast cancer patients undergoing high dose chemotherapy. Qual Life Res 10: 479-486, 2001

    Takeda F, Uki J: Recent progress in cancer pain management and palliative care in Japan. Ann Acad Med Singapore 23: 296-299, 1994

    Blagden SP, Charman SC, Sharples LD, et al: Performance status: Do patients and their oncologists agree. Br J Cancer 89: 1022-1027, 2003

    Efron B, Tibshirani R: An Introduction to the Bootstrap. New York, NY, Chapman & Hall, 1993

    Cheung YB, Thumboo J, Machin D, et al: Modelling variability of quality of life scores: A study of questionnaire version and bilingualism. Qual Life Res 13: 897-906, 2004

    Kemmler G, Holzner B, Kopp M, et al: Comparison of two quality-of-life instruments for cancer patients. J Clin Oncol 17: 2932-2940, 1999

    Kuenstner S, Langelotz C, Budach V, et al: The comparability of quality of life scores: A multitrait multimethod analysis of the EORTC QOL-C30, SF-36 and FLIC questionnaires. Eur J Cancer 38: 339-348, 2002

    Zee BC, Osoba D: Health-related quality-of-life outcomes, in Crowley J (ed): Handbook of Statistics in Clinical Oncology. New York, NY, Marcel Dekker, 2001, pp 249-267

    Kopp M, Schweigkofler H, Holzner B, et al: EORTC QLQ-C30 and FACT-BMT for the measurement of quality of life in bone marrow transplant recipients: A comparison. Eur J Haematol 65: 97-103, 2000

    Hobday T, Sloan J, Goldberg R: Authors' reply. J Clin Oncol 21: 3179, 2003

    Nordin K, Steel J, Hoffman K, et al: Alternative methods of interpreting quality of life data in advanced gastrointestinal cancer patients. Br J Cancer 85: 1265-1272, 2001

    Dharma-Wardene M, Au HJ, Hanson J, et al: Baseline FACT-G score is a predictor of survival for advanced lung cancer. Qual Life Res 13: 1209-1216, 2004

    Cella D: What do global quality-of-life questions really measure? Insights from Hobday et al and the "do something" rule. J Clin Oncol 21: 3178-3179, 2003

    Bernheim JL: How to get serious answers to the serious question: "How have you been?"—Subjective quality of life (QOL) as an individual experiential emergent construct. Bioethics 13: 272-287, 1999

    Lumley T, Simes RJ, Gebski V, et al: Combining components of quality of life to increase precision and evaluate trade-offs. Stat Med 20: 3231-3249, 2001(Yin-Bun Cheung, Cynthia G)