URL: http://mueller.educ.ucalgary.ca/waste2001.html
Draft: 2001.7.1

RESEARCH ETHICS BOARDS: A WASTE OF TIME?

John H. Mueller
Division of Applied Psychology
University of Calgary

John J. Furedy
Department of Psychology
University of Toronto

Address all correspondence to:

Dr. John Mueller
Division of Applied Psychology
University of Calgary
Calgary, Alberta
T2N 1N4
403-220-5664 (voice)
403-282-9244 (fax)

mueller@ucalgary.ca
furedy@psych.utoronto.ca


ABSTRACT

This commentary considers the effectiveness of the research proposal review process as it has evolved in Canadian human psychological research, culminating in the recent implementation of the "Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans". There are two paramount questions: first, is there any evidence that the review is effective, and, second, what would be the nature of that evidence. We are concerned that these issues have not been adequately addressed in spite of increasing resources devoted to the review process.


THE SITUATION

In the past generation, social science research proposals have come under increasing scrutiny by Research Ethics Boards (REB) in Canada and Institutional Review Boards (IRB) in the States. The mandate whereby these review groups emerged was in the interest of protecting human participants from "extraordinary risks," everyday risk being accepted as unavoidable.

From this reasonable base, which involved departmental-level review, a veritable industry has developed and expanded in several directions. In addition to "risk", the review now includes experimental design, and "risk" has been redefined to include the more nebulous notion of "ethics". Some of the issues that are raised today in reviews seem more properly labeled "etiquette" rather than ethics even, certainly they are not "risk" in any common usage of the term. The review now is obligatory for all proposals, not just those that seem problematic, and not just for federally funded research but every project on campus. The review is no longer entrusted to the departmental level but generally occurs at some campus-wide level, where expertise in the research area is no longer deemed as relevant as the self-expressed interest in "ethics" or "bio-ethics." This introduces a further complication in that the ethical issues that preoccupy medical researchers are presumed to be relevant to every department on campus. Further, as we begin to contemplate concerns such as "beneficence," "respect," "justice," and "liability," along with obligatory indoctrination workshops as a prerequisite to review, it is clear that the limiting horizon for this expansion is not yet in sight. Contrary to Adair (2001) and Puglisi (2001), we see no evidence that, if we just learn the rules and cooperate, then the regulators will cease to encroach on intellectual inquiry in the social sciences. Sadly, the pattern has been quite the opposite in the past generation.

In the U. S., the IRB situation has become so murky that the best advice that some can give is "Don't talk to the humans" (Shea, 2000). In Canada, the status and scope of REBs has been expanded by the recent implementation of the Tricouncil Policy Statement, which, in contrast to American practice, was labelled a "code" rather than a set of guidelines, and hence attracted considerable international attention (see, e. g., Azar, 1997; Holden, 1997). The current version of the Tricouncil Policy Statement lacks some of the original attacks on the basic epistemological function of research, such as the rule that if a subject, during debriefing after an experiment, finds the researcher's hypotheses offensive, then that subject can withdraw his or her data (e. g., Furedy, 1997, 1998). However, even the current version of the Tricouncil "statement" has been criticized for what some view as being unsuitable for application to psychological and sociological research on humans (e. g., Howard, 1998), and there are no guarantees that the next iteration will not try to reinstate such anti-intellectual requirements.

Is this expanded review effort worth it? For that matter, was the review working before recent expansions? We shall not attempt to provide a full cost/ benefit analysis of the review process in this brief note. Such an analysis would need to consider aspects like the distinction between epistemological and ethical functions, and the potentially deleterious educational effect on young researchers who are increasingly trained in how to pass ethics reviews rather than being educated in the complex research problems of their discipline. Rather, we shall focus on a specific benefit issue by applying the business model metaphor that we are advised is so relevant to the campus these days: are we getting our bang for the buck? That is, we suggest that we should check the key performance indicators, to be sure that we are getting corresponding benefits in terms of reduced hazard to subjects as a return for our increased efforts in reviewing proposals.


ARE REVIEWS WORKING? HOW CAN WE TELL?

It is possible to ponder many aspects of the ethics review industry, such as what constitutes reasonable risk, which was the original mandate, but we will avoid those here because there is a far more fundamental issue, one that transcends national boundaries. Specifically, what hard evidence is there that the review process does in fact reduce "problems" (i. e., untoward incidents during the experiment)? It is not our purpose here to devise these indicators in any detailed or systematic fashion, and frankly it is not our responsibility. By way of illustration we can identify a couple of thought experiments to articulate the nature of the question and how it might be answered. Nearly 20 years ago, Ceci, Peters, and Plotkin (1985) briefly considered how expensive the IRB process might be. The evidence then at hand was largely estimates, and the cost was said to be "sound insurance" (p. 995). However, that evaluation was not focused on what we see as the key indicator of effectiveness: concrete evidence of incident avoidance. In fact, the review industry seems reluctant to actually count incidents arising, apparently finding comfort in the prospect that reviews were warranted if they avoid "even a single case of malfeasance" (p. 995). In any event, the enterprise has grown over 20 years, and it is legitimate to ask whether the expansion per se is providing better protection to the public.

Evidence supporting effectiveness for the review process might come from something straight-forward, such as how many incidents (e.g., subject complaints) arising from research were reported in 1950? 1960? and so forth, per decade. The question is whether those data show progressively fewer incidents per experiment conducted over the last 50 years, during which time there has been ever more aggressive screening of research proposals? This would hardly prove a causal connection, but it seems a minimalist expectation that more review effort should result in fewer problem reports from the laboratory. We doubt that the incident rate is going down, for two main reasons.

First, these days anybody can complain about anything, no matter how much screening and no matter how trivial the concern in absolute terms, and still find someone to nurture them along for a legal fee. REBs can't have any influence on this aspect of our litigious society.

Second, the "bad guys" are not going to come asking IRB/REB permission. The proposal review movement was stimulated by the hearings at Nuremberg, but there is obviously no truth to the inference that the Holocaust would have been prevented had ethics review boards been in existence during the war. Neither Dr. Mengele nor Dr. Frankenstein applied to an ethics review board, and their contemporary counterparts will not do so either. Acts of malfeasance can not be prevented this way, but there continues to be resistance to accepting this simple truth.


SOME EVIDENCE IS MISLEADING: PROBLEM FINDING 101

There is one type of data that must be dismissed as bogus evidence. It appears that when an ethics reviewer identifies something in a proposal that is allegedly a problem, then some people see this as justifying the review process. That is, a "problem" found is said to be an incident avoided. But it doesn't work that way, and this assumption needs to be made explicit and rejected: "Revision requested by REB" does not constitute a "problem" that would have occurred during the experiment.

By analogy, consider a company that is obliged to institute an accident prevention program for the workplace. Someone dutifully goes around and identifies alleged hazards, and amasses an impressive count of things "fixed." Is this relevant? No, and in the non-academic world it would seem preposterous to accept this hazard count as the indicant of the success of the intervention. The only acceptable evidence would be whether the actual rate of accidents declined. Actual outcome measures are required for assessing IRB/REB value as well. For ethics reviews specifically, the problem-found count is flawed for a couple of reasons.

No consensus on definition of risk. First, that something is identified as a problem by an REB reviewer does not mean the subject in the experiment will see it as a problem. There is far from perfect overlap between the "professional" and the "public" perception of a problem. This is supported by the fact that occasional incidents arise in projects that reviewers approved as clean. And there is no reason to believe that this sword doesn't cut both ways, in other words, things that reviewers see as potential problems would be non-events to the public. In fact, the latter is increasingly likely as the nature of the reviewer's criteria become more nebulous and personal. "Revision requested by REB" may speak to the creative abilities of the reviewers, but it is not a barometer of the success of the ethics review process at avoiding risk.

Worst case is not normal. Second, the review process seems to be dedicated to identifying a "worst case" scenario, but then proceeding as if the worst case will be the norm, which of course is simply nonsense! Just because something "could" happen does not mean it "will," and when the worst case is an improbable event then this confusion becomes more wasteful as there will be no meaningful change in the genuine accident rate.

To illustrate, one might be hit by a truck leaving the office, but it would be unwarranted for your wife to book an appointment with the undertaker this afternoon on that presumption. You might win the lottery next weekend, but it would not be prudent to hit your boss in the face with a pie this afternoon. That's why the original concept of "everyday risk" was useful. Unfortunately the ethics review process seems to have evolved to a state whereby the review assumes that the worst case will be not just the norm but the certainty. A goal of achieving "zero risk" seemingly has replaced the rational acceptance of everyday risk.

Discrete incidents. The accident metaphor that may be appropriate is flight insurance. The experiment is a discrete interval of time, like a flight: does a problem occur during that specific interval of time? Life insurance for your lifetime involves an unfortunately high and definite probability of death, whereas flight insurance is whether you die during a discrete interval of time. Most financial advisers have long considered flight insurance to be grossly over-priced, similar to the argument we are making about the ethics review process. Confusion of different kinds of risk is quite useful to the insurance industry, but expensive to the consumer. For whom is it useful to confuse varieties of risk in the ethics review process?

And, no, considering institutional risk to be the collection of all experimenters working does not convert it to a cumulative risk, each experiment (flight) is an independent risk.

In short, "revision requested" cannot be a metric for the success of the ethics review process at avoiding risk in the experimental setting, and its imperfection just increases when the alleged risk in question is unlikely, a waste of time. As emotionally satisfying as discovering a "problem" might be, such identifications are bogus with regard to documenting review effectiveness.

Further in the category of bad evidence, it is possible to imagine a situation whereby a letter goes around campus to the effect that "We had no complaints from experimental subjects this year, thanks to the diligent efforts of our ethics reviewers." We hope that survivors of Statistics 101, if not Psychology 101, can see the problem with such a causal attribution.


OTHER EVIDENCE

In addition to the per-decade incident-rate analysis mentioned above, here are at least two other ways one might assess the success of the ethics review process.

First, consider an experiment where for a year a random half of the applications to the IRB are approved without review, whereas the other half get the conventional review. At the end of the year we look at the number of problems-arising in the actual experiment in each group. Would the number of problems-arising in the unreviewed group be any different than in the reviewed group? It seems doubtful, yet that problem-actually-arising rate difference is the only true evidence for the success of risk avoidance by proposal review activity.

Another experiment  would be to take proposals approved at one research site and submit them to an IRB elsewhere. Would the prospect of approval at the second (third, etc.) IRB be different from 50:50? Alternatively one could take proposals rejected at one research site and have them reviewed elsewhere, and perhaps the strongest test of this type would be to take the method section from published articles and submit them for review to various ethics review boards.

You may be thinking that this would have to be submitted to the IRBs and they would never risk finding out? But, to the contrary, this one does not seem to require "ethical" review at all, and here's why: The alleged purpose of the proposal review is to "protect the public," and this project would never involve the public, just the review boards. If the review process is about protecting the public, one of these re-review projects could be done by anyone at anytime, and perhaps it is underway even now.

Analogous research has been done before (e. g., Peters & Ceci, 1982), and the results were not popular, as conventional wisdom about peer review proved to be less than robust. This resubmission procedure begs to be applied to the ethics review process: do we have repeat-reliability for the ethics review decisions? What we know is not encouraging. Eaton (1983) reported reliability to be 8%. Ceci et al. (1985) found reliability to vary as a function of the "sensitivity" of the research proposed. The values obtained by Ceci et al. were obtained in a context where the IRB reviews knew they were involved in an experiment, perhaps a best case scenario, but at least quite different from the circumstances under which the review boards operate on a daily basis. Furthermore, these values derive from an era when reviews were still based on everyday risk, and it seems unlikely that 20 years of obfuscation of review criteria have improved reliability.

MISSED OPPORTUNITIES

Finally, we should note that research opportunities have been squandered here, if no incident data have been being collected as we surmise.

First, social scientists would normally inquire about the "profile of the offender," that is, what are the common characteristics of those proposals that result in public risk. Actually, even without systematic data, the offender profile seems fairly clear: when the research involves a vested interest (e.g., drug profits) the probability of misconduct is increased. In other words, it is not Bob and Sally Neuprof in Social Sciences, individuals struggling to start a career, yet it is Bob and Sally who nonetheless are forced through all the "preventive" review. Bob and Sally may forget to lock a file cabinet, just as someone may fax medical records to the wrong number, but IRBs can not prevent human accidents. We can do a much better job of utilizing watchdog resources than by pretending we are all equal opportunity offenders to-be.

Second, much has been made of the added value of having members of the public at large on the ethics review boards, for example. Had we been collecting incident data over these decades, we might be able to see that lay input has indeed further reduced the number of untoward incidents, without relying on intuitions that such things are effective. Any number of other ethics innovations could be validated as truly adding value in similar fashion, if only we had been collecting the incident data.

It is shoddy scholarship, and irresponsible bureaucracy, not to be collecting actual incident data, for these reasons if no other. We need to know what aspects of the review process add value here, in terms of actually reducing the number of problems arising, and just as important which practices merely waste time and money.


CONCLUSIONS

There are further considerations, but the concern is sufficiently illustrated by these. Identifying alleged problems does not indicate that the ethics review process is successful at avoiding incidents (real or imagined) in the experimental setting. If hazard avoidance is the goal, a declining problem-arising rate in actual experiments is the only valid measure of success. We are aware that the measures that we have noted here have short-comings, but the purpose in raising them was to underscore the need to acknowledge and pursue the question of accountability for the proposal review process.

We have not been able to find any hard evidence that screening has had any effect in reducing problems. In fact, there seems to be very little interest in the lack of evidence. Thus we conclude that we need to take an honest look at the possibility that all this time and effort is not accomplishing anything in the way of protecting the public.

If there is no evidence supporting the effectiveness of the review process, we really should ask how much ineffective regulation we are willing to impose on innocent applicants such as ourselves. If we are not even interested in measuring incident rate as it actually occurs in the experiments, in terms that would satisfy accountants, tax auditors, insurance actuaries, and the VP-Finance, then why not? Is there perhaps a latent Orwellian agenda that really underlies the recent expansions of the North American review process with human subjects? Ceci et al. (1985) established 20 years ago that "socially sensitive" issues were troublesome for review boards, even though the guidelines of that day explicitly forbade using social criteria. Things likely have not improved in this regard as the review criteria have become more nebulous. What are we really trying to achieve here? How can we document success and failure?


REFERENCES

Adair, J. G. (2001). Ethics of psychological research: New policies; Continuing issues; New concerns. Canadian Psychology, 42, 25-37.

Azar, B, (1997) Ethics-code changes may dampen research efforts. American Psychological Association Monitor, 28( March), 27.

Ceci, S. J., Peters, D., & Plotkin, J. (1985). Human subjects review, personal values, and the regulation of social science research. American Psychologist, 40, 994-1002.

Eaton, W. O. (1983). The reliability of ethics reviews: Some initial findings. Canadian Psychologist, 24, 269-270.

Furedy, J. J. (1997) "An interpretation of the Canadian proposed tri-council ethics code: Epistemological crime and cover-up", in a symposium on "Social policy masked as ethics hurts science: Some working scientific perspectives", at Society for Neuroscience meeting, New Orleans, November, 1997.

Furedy, J. J. (1998). Ethical conduct of research from code to guidelines: A shift in the Tricouncil approach? Society for Academic Freedom and Scholarship Newsletter, 18, 6.

Holden, C. (1997) Draft research code raises hackles. Science, 274, 1604.

Howard, R. E. (1998) Letter to Nina Stipich, Senior Policy Analyst, SSHRC. Personal communication, February 20, 1998. In Furedy, J. J., Eminent scholar's concern about tri-council statement, Society for Academic Freedom and Scholarship Newsletter, 20, 3-8.

Peters, D., Ceci, S. (1982) Peer-review practices of psychological journals: The fate of submitted articles, submitted again. Behavioral and Brain Sciences. 5, 187-255.

Puglisi, T. (2001). IRB review: It helps to know the regulatory framework. American Psychological Society Observer, 14, ****

Shea, C. (2000) Don't talk to the humans: The crackdown on social science research. Lingua Franca, 10 (6), 26-34.

Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (1998, January)