当前位置: 首页 > 期刊 > 《英国医生杂志》 > 2004年第12期 > 正文
编号:11340049
Effects of training on quality of peer review: randomised controlled trial
http://www.100md.com 《英国医生杂志》
     1 BMJ Editorial Office, BMA House, Tavistock Square, London WC1H 9JR, 2 London School of Hygiene and Tropical Medicine, London WC1E 7HT

    Correspondence to: S Schroter sschroter@bmj.com

    Abstract

    Many studies have illustrated the inadequacies of peer review and its limitations in improving the quality of research papers.1 However, few studies have evaluated interventions that try to improve peer review,2 and no randomised controlled trials have examined the effects of training.3 Training that would be feasible for reviewers to undergo and for a journal to provide would have to be short or provided at a distance. Although the effectiveness of short educational interventions is questionable, some brief interventions have been shown to be successful (depending on what is being taught and the methods used).4 5

    We aimed to determine whether reviewers for the BMJ who underwent training would produce reviews of better quality than those who received no training; whether face to face training would be more beneficial than a self taught package; and whether any training effect would last at least six months.

    Methods

    Participants

    Of 1256 reviewers assessed, 609 (48%) eligible reviewers agreed to take part (fig). The quality of the baseline reviews of those who did not complete follow up reviews was poorer than that of reviewers who did (review quality instrument score 2.60 v 2.73; P = 0.16), they detected fewer major errors (2.11 v 2.67; P = 0.01), and they recommended rejection less often (58% v 70%; P = 0.037) (table 1).

    Progress of participants through the trial

    Table 1 Comparison of characteristics and baseline scores for reviewers who completed the first review only and those who completed at least two reviews. Values are numbers (percentages) unless stated otherwise

    Evaluation of training interventions

    One hundred and fifty eight reviewers attended training workshops, and 81% (114/141) anticipated that the quality of their reviews would improve. Most of the 120 recipients of the self taught package who completed review 2 reported having used the package (104 (87%) completed three of the five exercises, and 103 (86%) did all five), and 98 (82%) felt that the quality of their reviews would improve as a result.

    Outcome measures

    Agreement was good between pairs of raters assessing the quality of reviews (intraclass correlation coefficient for total review quality instrument score 0.65) and the number of deliberate major errors identified (intraclass correlation coefficient 0.91).

    Review quality instrument scores—The mean score for the whole sample was 2.71 (SD 0.73) for the first review and was similar across all three groups (table 2). For review 2, the self taught group scored significantly higher (2.85) than did the control group (2.56), a difference of 0.29 (95% confidence interval 0.14 to 0.44; P = 0.0002). The difference between the control group and the face to face group (2.72) was 0.16 (0.02 to 0.3; P = 0.025). We found no significant difference between any of the groups for the third review (P = 0.204), and the upper 95% confidence limit was at most 0.29. The participants in the control group who did a third review showed a small but significant rise in their score (0.17, 0.09 to 0.26; P = 0.0001), which reduced the difference between them and the intervention groups.

    Table 2 Review quality, errors detected, time taken, and proportion recommending rejection (based on data from all participants). Values are means (SDs) unless stated otherwise

    Errors identified—The number of errors detected in the baseline reviews was similar in the three groups (table 2). However, the difference between the control group and each of the intervention groups was significant for review 2 (2.13 v 3.14 and 2.96; self taught-control difference = 1.00, 0.65 to 1.37; face to face-control difference = 0.83, 0.47 to 1.19) and remained significant after adjustment of the scores for the number of errors reported in the first review (analysis of covariance, P < 0.0001). The differences observed for review 3 were slightly smaller but in a similar direction (2.71 v 3.37 and 3.18) and were not significantly different after adjustment for baseline and multiple testing.

    Time taken to review and recommendation—Generally, the mean time taken to review papers did not differ significantly between the groups (table 2). All three groups spent less time doing the third review than the previous two reviews. The proportion of reviewers recommending rejection of paper 1 was similar across the groups. The proportion recommending rejection of paper 2 was significantly lower for the control group than for the self taught group (76% v 92%; P < 0.0001), and the same pattern occurred for paper 3 (74% v 91%; P = 0.001).

    Impact of non-responders

    As the difference between responders and non-responders is unknown, the impact of non-response on the conclusions cannot be definitively determined. However, assuming that given the observed data from an individual reviewer his or her unseen response provides no additional information on the reason for non-response (the "missing at random" assumption), non-response has no effect on the statistical significance of the results. Alternatively, a more conservative approach is to assume that the mean for non-responders is shifted down from that of responders, by the same amount in each intervention. Then, for the analysis of covariance comparison of review quality instrument scores between the self taught and control groups, we have to reduce the mean for the non-responders by 0.46 for the difference to become statistically insignificant.14

    Discussion

    Rennie D. Editorial peer review: its development and rationale. In: Godlee F, Jefferson T, eds. Peer review in health sciences. London: BMJ Books, 1999.

    Callaham ML, Knopp RK, Gallagher EJ. Effect of written feedback by editors on quality of reviews. JAMA 2002;287: 2781-3.

    Callaham ML, Wears RL, Waeckerle JF. Effect of attendance at a training session on peer reviewer quality and performance. Ann Emerg Med 1998;32: 318-22.

    Thomson O'Brien MA, Freemantle N, Oxman AD, Wolf F, David DA, Herrin J. Continuing education meetings and workshops: effects on professional practice and health care outcomes. Cochrane Database Syst Rev 2003;(4): CD003030 .

    Beaulieu M, Rivard M, Hudon E, Beaudoin C, Saucier D, Remondin M. Comparative trial of a short workshop designed to enhance appropriate use of screening tests by family physicians. Can Med Assoc J 2002;167: 1241-6.

    Van Rooyen S, Black N, Godlee F. Development of the review quality instrument (RQI) for assessing peer reviews of manuscripts. J Clin Epidemiol 1999;52: 625-9.

    Evans AT, McNutt RA, Fletcher SW, Fletcher RH. The characteristics of peer reviewers who produce good-quality reviews. J Gen Intern Med 1993;8: 422-8.

    Black N, van Rooyen S, Godlee F, Smith R, Evans S. What makes a good reviewer and a good review in a general medical journal. JAMA 1998;280: 231-3.

    Van Rooyen S, Godlee F, Smith R, Evans S, Black N. The effect of blinding and unmasking on the quality of peer review: a randomized trial. JAMA 1998;280: 234-7.

    Van Rooyen S, Godlee F, Evans S, Black N, Smith R. Effect of open peer review on quality of reviews and on reviewers' recommendations: a randomised trial. BMJ 1999;318: 23-7.

    Walsh E, Rooney M, Appleby L, Wilkinson G. Open peer review: a randomised controlled trial. Br J Psychiatry 2000;176: 47-51.

    Schafer JL. Multiple imputation: a primer. Stat Methods Med Res 1999;8: 3-15.

    Simon R. Bayesian subset analysis: applications to studying treatment-by-gender interactions. Stat Med 2002;21: 2909-16.

    White I, Carpenter J, Evans S, Schroter S. Eliciting and using expert opinions about non-response bias in randomised controlled trials. Technical report: email James.Carpenter@lshtm.ac.uk(Sara Schroter, senior res)