Are these data real? Statistical methods for the detection of data fab(百拇医药)

Are these data real? Statistical methods for the detection of data fab

http://www.100md.com 《英国医生杂志》

     1 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT

    Correspondence to: S Evans stephen.evans@Lshtm.ac.uk

    Objectives To test the application of statistical methods to detect data fabrication in a clinical trial.

    Setting Data from two clinical trials: a trial of a dietary intervention for cardiovascular disease and a trial of a drug intervention for the same problem.

    Outcome measures Baseline comparisons of means and variances of cardiovascular risk factors; digit preference overall and its pattern by group.

    Results In the dietary intervention trial, variances for 16 of the 22 variables available at baseline were significally different, and 10 significant differences were seen in means for these variables. Some of these P values were extraordinarily small. Distributions of the final recorded digit were significantly different between the intervention and the control group at baseline for 14/22 variables in the dietary trial. In the drug trial, only five variables were available, and no significant differences between the groups for baseline values in means or variances or digit preference were seen.

    Conclusions Several statistical features of the data from the dietary trial are so strongly suggestive of data fabrication that no other explanation is likely.

    Most statistical analyses of clinical trials are undertaken on the presumption that the data are genuine. Large accidental errors can be detected during data analysis,1 2 but if people are trying to "make up" data they are likely to do it in such a way that it is not immediately obvious, avoiding any large discrepancies. Nevertheless, fraudulent data have particular statistical features that are not evident in data containing accidental errors, and several analytical methods have been developed to detect fraud in clinical trials.3 4 The BMJ has taken a general interest in this field and has published a book on fraud and misconduct, now in its third edition, which has a chapter on statistical methods of detection of fraud.5

    In this paper we use statistical techniques to examine data from two randomised controlled trials. In one trial, the possibility of scientific misconduct had been raised by BMJ referees, based on inconsistencies in calculated P values compared with the means, standard deviations, and sample sizes presented (see p 281). For comparison, we used the same methods to analyse a second trial for which there were no such concerns. We were not involved in either trial.

    Methods

    The trial about which doubts were raised (the diet trial) was a single blind, randomised controlled trial of the effects of a fruit and vegetable enriched diet in 831 patients with coronary heart disease, including patients with angina pectoris, myocardial infarction, or surrogate risk factors. Study participants were stated to be randomly allocated to the intervention diet (Group I, n = 415) or to the control group, which was the patient's usual diet (Group C, n = 416). The aim was to examine the effect of the intervention diet on risk factors for coronary artery disease after two years. We do not present data from the two year follow-up, because differences between groups could arise as a result of the interventions. After the reviewers had expressed suspicions about the integrity of the data, the BMJ requested the original trial data. These were provided by the trial's first author on handwritten sheets, which we entered on to computer, making appropriate checks to avoid transcription errors. The data are considered in the two randomised groups at baseline, Group I and Group C.

    The second ("drug") trial was a randomised controlled trial of the effects of drug treatment in 21 750 patients with mild hypertension from 31 centres, from which we randomly selected five centres with 838 patients who had complete data for the selected variables. Study participants were randomly allocated to receive the drug (Group I, N = 403) or a placebo (Group C, N = 435). The aim was to determine whether drug treatment reduced the occurrence of stroke, death due to hypertension and coronary events in men and women aged 35-64 years, when followed for two years (again we do not present data from the follow-up). The drug trial data were provided by the trial investigators as computer files. The data are presented by treatment group (I or C) at baseline, using the same notation as for the diet trial. The variables in this study in common with the diet study are weight, diastolic blood pressure, systolic blood pressure, cholesterol measurements, and height. Further details of the methods and results from that trial have been published.6

    Statistical methods

    We conducted various tests on the baseline data of the randomised groups in both trials, looking for patterns that might indicate that the data in the diet trial were not generated by the normal process of making and recording individual measurements on a series of patients. We used the data from the drug trial for comparison, since we expected them to show patterns typical of data collected normally during a trial.

    Using basic descriptive statistics and conventional statistical significance tests we compared the baseline data in the randomised groups in both trials. In a randomised trial, the data at baseline should be similar in the randomised groups. (The mean, the variability, the shape of the distribution of the data, and the pattern of data resulting from the methods of measurement must be similar since the groups can differ from one another only by chance factors.) This is the reason why in general, tests for statistical significance are not conducted at baseline in genuine trials. If such tests are carried out about one in 20 of such tests will be significant purely by chance. We used t tests to compare the means of the randomised groups and F tests to compare the variances (standard deviations).

    Data that are recorded (or invented) by people (as opposed to machines) tend to show preferences for certain numbers, such as rounding to the nearest 5 or 10. This is seen in the last recorded digit of numbers, and is called "digit preference." This digit preference should be similar between groups formed just by a chance process—randomisation. We used 2 tests to examine whether there was any tendency for the last digit to take on particular values and whether any observed digit preference was the same in the two groups created by randomisation. Digit preference can occur in all legitimate data based on human recording, but any pattern of this preference should be similar between groups formed using randomisation. We used SPSS, version 12.0.1 (Chicago, USA), for our data analysis.

    Results

    Table 1 shows descriptive summaries of variables common to both trials for both groups in each trial. The drug trial values show what might be expected in a randomised trial, but the diet trial shows notable differences in standard deviations for height and cholesterol measurements.

    Table 1 Baseline variables in the two trials under comparison

    Table 2 shows for each trial the results of t and F tests, for differences in means and also in variances between the intervention and control groups at baseline for all available variables. In a genuine trial, correctly randomised, any such differences would be due to chance. Usually P values should not be quoted to greater precision than P < 0.001, but because of the extreme nature of these P values, their exact value is given. In the diet trial, differences in variances were significant for 16 of the 22 variables that were available, as were 10 differences in means for these variables. Several of the P values were extraordinarily small. The expectation is that about 5% of such comparisons would have P < 0.05, and extremely small P values should not occur. In the drug trial, none of the baseline means and none of the baseline variances showed statistically significant differences between the two groups, though only five variables were compared.

    Table 2 Baseline comparison of the two intervention groups, diet trial and drug trial

    Table 3 shows the analysis of digit preference, assuming a uniform distribution of last digits. In the diet trial, all of the 2 values were highly significant, indicating that all the variables showed strong digit preference, although some preference is not unexpected. Digit preference was also evident for the results of a laboratory cholesterol test, which is unexpected since human estimation of the results is not usual. Measurements of height were not supplied for the diet trial (they were derivable from body mass index and weight for means, but this is not relevant for digit preference). In the drug trial, the 2 value was highly significant for height (indicating strong digit preference as might be expected) but not for any of the other measures Blood pressure measurement used a random zero machine, intended to remove digit preference. Table 4 shows the results of 2 testing for a difference in the pattern of digit preference between the two groups created by randomisation. This allows for the fact that digit preference can occur, but this should show a similar pattern in each of the randomised groups. In the diet trial, the final digit distributions are significantly different between the intervention group and the control group at baseline for all variables apart from cholesterol, fasting blood glucose, caffeine, carotene, and vitamin A. In the drug trial, the two randomised groups are far from being significantly different in terms of the final digit.

    Table 3 2 value (with P value) for the final digit at baseline, diet trial and drug trial

    Table 4 2 value (with P value) for the final digit at the baseline in the diet and drug trials between the two randomised groups

    Discussion

    Armitage P, Berry G. Statistical methods in medical research. 3rd ed. Oxford: Blackwell Scientific, 1994: 386-401.

    Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991: 122-143.

    Buyse M, George SL, Evans S, Geller NL, Ranstam J, Scherrer B, et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Stat Med 1999;18: 3435-51.

    Taylor RN, McEntegart DJ, Stillman EC. Statistical techniques to detect fraud and other data irregularities in clinical questionnaire data. Drug Inform J 2002;36: 115-25.

    Evans S. Statistical aspects of the detection of fraud. In: Lock S, Wells F, Farthing M, eds. Fraud and misconduct in medical research. 3rd ed. London: BMJ Publishing Group, 2001: 186-204.

    Medical Research Council Working Party. MRC trial of treatment of mild hypertension: principal result. BMJ 1985;291: 97-104.(Sanaa Al-Marzouki, research student1, St)

http://www.100md.com/html/DirDu/2007/03/10/38/52/07.htm