Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
B Outcome Measures for Assessing Integrity in the Research Environment This appendix to the report describes outcome measures and models for the development of outcome measures that could be used or adapted for use by institutions and educators who wish to assess integrity in the research environment. These measures can be applied to assessments of individuals or institutions by processes recommended in this report. The appendix describes two kinds of outcome measures. First, it de- scribes measures that have been used to assess the moral climate of an institution. Although measures have not been developed specifically for assessment of the climate of integrity in the research institution, measures and methods that could be adapted for use by research institutions have been developed in other settings. Second, the appendix describes measures that have been used to as- sess aspects of integrity of the individual. The goal is to recommend mea- sures that could be used (or adapted) by researchers or institutions in- terested in assessing outcomes of educational efforts to promote the development of integrity in research in trainees. The emphasis will be on outcome measures that are theoretically grounded, that are at least indi- rect measures of behavior, and that either have been effectively used or have good potential for linking the development of aspects of integrity (e.g., ethical sensitivity, moral reasoning and judgment, and identity for- mation) to institutional effectiveness. In cases in which a recommended measure cannot be used exactly as designed, the criterion for determina- tion of inclusion in this review is whether the method of assessment has 143
144 APPENDIX B been sufficiently well validatedâeven if it is in a setting other than re- searchâto warrant adaptation to the research environment. In summary, measures that meet the following criteria are included: (1) they are theoretically well grounded in a model of morality that dem- onstrates the relationship between aspects of integrity and behavior; (2) they meet or exceed the minimal criteria for validity and reliability; (3) they have been successfully used to assess learning outcomes for adults either in research ethics programs or in professional ethics programs; (4) they have been used effectively to assess institutional effectiveness in promoting one or more aspects of integrity; and (5) the method of mea- surement is appropriate for assessment of an aspect of integrity in the research environment, even though the content of the measure may be specific to another discipline. Note that this discussion does not include measures or tests that assess content knowledge of the rules related to the conduct of research, measures that assess perceptions of the integrity of others (e.g., survey instruments designed for the Acadia Institute study), or measures designed to assess the norms of scientists with respect to misconduct and questionable research practices (Bebeau and Davis, 1996; Korenman et al., 1998). The latter might serve as a resource for the devel- opment of items for use in a survey of the moral climate of an institution or for items for assessment of role concept development. METHODS AND MEASURES FOR ASSESSING INTEGRITY IN THE RESEARCH ENVIRONMENT Two bodies of literature contribute to the understanding of moral climate and its importance for the assessment of integrity in the research environment. The first is the literature on individual moral development, indicating that individual characteristics are not sufficient as an explana- tion for ethical behavior. Thus, efforts to influence behavior by focusing on the development of abilities related to decision making may be neces- sary, but not sufficient, to affect integrity in the research environment. The second is the literature on organizational culture and climate that highlights the different kinds of cultures that may be operating in the environment. There is a growing belief that organizations are social actors responsible for the ethical or unethical behaviors of their employees. In fact, corporations (Bowen and Power, 1993) have been held responsible under the law for acts of malfeasance and misfeasance engaged in by employees, sometimes even when the acts of those employees were be- yond the scope of their employment. Such instances prompted scholars in the field of organizational development to turn their attention to the as- sessment of moral climate and to an analysis of the effects of moral cli- mate on decision making.
OUTCOME MEASURES FOR ASSESSING INTEGRITY 145 Individual Development and Its Relationship to Collective Norms In the early 1980s, developmental psychologists working in correc- tional facilities and high schools introduced the concept of a âmoral atmo- sphereâ or âjust communityâ to explain the social context that shaped collective norms, which seemed either to inhibit or to override the influ- ence of individual moral development on behavior. To measure moral atmosphere, researchers (Higgins et al., 1984; Power, 1980; Power et al., 1989) presented students with dilemmas likely to occur in their environ- ment. For example, in a high school setting, researchers might present situations involving someone who cheated on an exam or someone who was rude to others. The researcher elicits judgments of responsibility (e.g., What do you think _____ should do? Why?) and judgments of practicality (e.g., What would you do? Why?). These were contrasted with percep- tions of the collective norms (What would most others in your school do in this situation? Why would they do that?). Through interviews, research- ers were able to identify collective norms and establish whether the norm emerged from within the group or was stipulated by authority external to the group. Then, the degree to which the norm met moral standards and the degree to which individuals were committed to each norm were as- sessed. By use of this strategy, it was possible to detect groups with strong, but morally defective, collective norms.1 Furthermore, researchers were able to show that groups develop collective norms that belong only to the group. When prosocial collective norms defined what was expected of group members as group members, individuals tended to conform to group norms even when their competence in moral decision making was not well developed. However, when the collective norms did not encour- age prosocial behavior,2 individuals with higher levels of competence in moral development felt alienated and discouraged from engaging in ac- tions consistent with their level of competence. Higgins and colleagues (1984) concluded that practical moral action is not simply a product of an individualâs moral competence but is a product of the interaction between his or her competence and the moral features of the situation. Melissa Anderson, in a National Science Foundation-funded longitu- dinal study of doctoral studentsâ acquisition of the concepts of science and its norms, uses interview questions similar to those used to elicit 1Examples of groups with morally defective collective norms might include repressive totalitarian states, fanatical cults, violent gangs, and organized crime. 2Psychologists use the term prosocial behaviors to distinguish behaviors that are clearly beneficial to another and support societal or communal norms from behaviors that may be norm or rule based (as in a teen-age gang or criminal group) but support the self, or hurt others. A prosocial behavior is not necessarily selfless.
146 APPENDIX B implicit norms that shape behavior in the studies cited above. Anderson describes the interview questions as follows: âA series of questions ask students to consider and comment on the relationship between academic norms and behavior (Do you see any conflicts between what people think or say you should do and the way work is actually done?), between their own perspectives and behavior (Do you see people around here acting contrary to your advice [to doctoral students on how to avoid serious mistakes]?) and between their own normative perspectives and academic norms (Are there any ideas or rules about how you should do your work that you donât agree with?)â (Anderson, 2001, p. 2). Narrative accounts are then analyzed in terms of the contrasts presented above. At a confer- ence sponsored by Office of Research Integrity (ORI), U.S. Department of Health and Human Services (DHHS), in 2001, Anderson reported find- ings from an analysis of interviews with 30 first-year doctoral students. (See Chapter 5 for a further discussion of the initial findings and their relationship to education in the responsible conduct of research.) Organizational Literature Building on the early work on moral atmosphere, which attempted to define collective norms operating in the environment, Cullen and col- leagues argued, âthat corporations, like individuals, have their own sets of ethics that help define their characters. And just as personal ethics guide what an individual will do when faced with moral dilemmas, cor- porate ethics guide what an organization will do when faced with issues of conflicting valuesâ (Cullen et al., 1989, p. 50). Ethical climates were conceptualized as general and pervasive characteristics of organizations that affect a broad range of decisions. In the organizational literature, work climate is defined as âperceptions that are psychologically mean- ingful moral descriptions that people agree characterize a systemâs prac- tices and proceduresâ (Cullen et al., 1993, p. 180). In contrast to the interview strategy, which, although labor intensive, has the advantage of gauging individual concepts of responsibility as well as perceptions of the group norms, Cullen and colleagues (1993) developed and validated a 36-item questionnaire, the Ethical Climate Questionnaire, to assess perceptions of the norms operating within an organization. Examples of items used to assess climate are as follows: 1. In this company, people are mostly out for themselves. 2. The major responsibility for people in this company is to consider efficiency first.
OUTCOME MEASURES FOR ASSESSING INTEGRITY 147 3. In this company, people are expected to follow their own per- sonal and moral beliefs. 4. People are expected to do anything to further the companyâs in- terests. 5. In this company, people look out for each otherâs good. 6. There is no room for oneâs own personal morals or ethics in this company. 7. It is very important to follow strictly the companyâs rules and procedures here. 8. Work is considered substandard only when it hurts the com- panyâs interests. 9. Each person in this company decides for himself what is right and wrong. 10. In this company, people protect their own interest above other considerations. 11. The most important consideration in this company is each per- sonâs sense of right and wrong. 12. The most important concern is the good of all the people in the company. 13. The first consideration is whether a decision violates any law. 14. People are expected to comply with the law and professional standards over and above other considerations. 15. Everyone is expected to stick by company rules and procedures. Responses to the questionnaire confirm the multidimensional nature of ethical climate and substantiate the existence of a number of hypoth- esized ethical climates. Victor and Cullenâs (1988) measure is well vali- dated, and their studies confirm that ethical climates are perceived at the psychological level and that individuals within organizations are able to describe the moral atmosphere that prevails in their work units. The kinds of moral climates that prevail differ dramatically among organizations. Furthermore, there appears to be variance in the ethical climate within organizations by position, tenure, and work group membership. The au- thors argue that ethical climates, although relatively enduring, are not static. A careful assessment of the climate enables an organization to re- flect on its policies and practices and institute reforms. Examples of efforts to evaluate the organizational climate in settings that seem relevant to the research environment follow. As useful as these illustrations are for showing how an organization might assess its moral and ethical climate, it is still up to the institution to implement changes and then to reassess the climate to determine whether the improvements have occurred.
148 APPENDIX B Examples of Climate Assessments Conducted in Related Fields U.S. Office of Government Ethics In 1999, the U.S. Office of Government Ethics (OGE) hired a consult- ing firm to assess the effectiveness of the executive branch ethics program and to assess the ethical culture of the executive branch from the employ- eesâ perspective (OGE, 2000). The objective of the executive branch ethics program is to prevent conflicts of interest and misconduct that under- mine the publicâs trust in government. The study assessed employee per- ceptions of the ethical culture in the executive branch and enabled OGE to make specific decisions regarding the ethics training programs for execu- tive branch employees; the effectiveness of communication regarding the purpose, goals, and objectives of the ethics program; and the extent to which the program helped employees avoid at-risk situations. Because the study was a first attempt to assess the ethical climate of the executive branch the study focused on overall awareness rather than an analysis of the climate within individual executive branch agencies. The OGE survey was based on the IntraSight Assessment, an assess- ment tool developed by Arthur Andersen researchers and academic re- searchers in the fields of business ethics and organizational behavior. Whereas the full report claims that the measure is statistically reliable and valid, a summary of validity and reliability data on the measures was not provided. The IntraSight Assessment examines the impact of an organiza- tionâs ethics program by assessing employeesâ perceptions of observed unethical or illegal behaviors and several desirable outcomes of ethics efforts. The IntraSight Assessment examines program elements and cul- tural factors that, in the original study, had the greatest relationship with desirable outcomes. By providing a measure of outcomes and a measure of the related factors, the IntraSight Assessment provides direction for improving outcomes by addressing the factors most highly related to the desired outcomes. The assessment process provided data that OGE could use for continuous quality improvement. One might expect that future efforts at quality assessment would focus on evaluation of the effective- ness of ethics programs within agencies. Academic Integrity Assessment The Center for Academic Integrity at Duke University developed a process and measures that assist institutions of higher learning with as- sessing the extent to which the climate on their campuses promotes aca- demic integrity (Burnett et al., 1998). The process begins with the appointment of a campus committee charged with evaluating the state of academic integrity on campus and,
OUTCOME MEASURES FOR ASSESSING INTEGRITY 149 after a data collection process, drawing conclusions and making recom- mendations for ways that programs that have been charged with ensur- ing academic integrity can improve. The committee assembles back- ground information about the policies and disciplinary procedures (including information and statistics about sanctions that have been im- posed); collects descriptions of the educational programs and activities that inform students, faculty, and administrators about academic integ- rity on campus; conducts focus groups for administrators; and facilitates the collection of data on perceptions of the moral climate from students and faculty. The center conducts surveys using the Student Academic Integrity Survey and the Faculty Academic Integrity Survey designed by Donald McCabe. According to the developer, the surveys can be modified to address specific content issues that may be unique to the institution and to address objectives defined by the committee. The survey has been used in several studies, but the guide to the survey provides no references to the psychometric properties of the survey. A recent communication (Janu- ary 2002) with the test developer confirmed that there are no published data on the validity of the measure. The developer does periodically check its reliability, and it would be possible for the developer to make the data available. Included in the guide are criteria for review of an institutionâs policies and disciplinary procedures and outcomes. The center analyzes the data collected by the surveys, as well as comparison data from na- tional samples for the committeeâs use in examining the results. The committeeâs final task is to draw conclusions and make recommendations for ways in which the institutionâs academic integrity programs can be improved. Additional Examples The U.S. Army uses the Ethical Climate Assessment Survey and the Framework for Establishing/Changing Ethical Climate as part of leader- ship development for members of the U.S. military (U.S. Army, 2001). Leaders are directed to periodically assess their unitâs ethical climate and take appropriate actions to maintain the high ethical standards expected of all organizations that are part of the U.S Army. According to informa- tion from the web site (U.S. Army, 2001), an ethical climate is one in which âstated Army values are routinely articulated, supported, practiced and respected.â An organizationâs climate is determined by âthe individual character of unit members, the policies and practices within the organiza- tion, the actions of unit leaders, and environmental and mission factors.â ECAS is a self-administered questionnaire that leaders use to assess how the leader perceives his or her unit and leader actions. Col. George
150 APPENDIX B Forsythe (personal communication, United States Military Academy, January 2002) indicated that although the Army has used the measure extensively, studies of the validity of the measure have not been system- atically conducted. The National Center for Education Statistics of the U.S. Department of Education compiled the responses of teachers in private and public elementary and secondary schools to an ethical climate survey. The 27- item questionnaire is intended for use by individual schools to assess the organizationâs ethical culture. Summary It is apparent from the number of measures of moral climate that have been developed that scholars, at least scholars in organizational develop- ment, accept the notion that institutions differ in the kinds of moral and ethical climates that prevail and that the moral and ethical climate of an institution can influence a broad range of outcomes for which a given institution may be held accountable. There also appears to be a belief that institutions have a responsibility to assess the moral and ethical climate that prevails, to reflect on the policies and practices that contribute to that climate, to make appropriate adjustments, and to reassess their moral and ethical climates. It is also apparent that whereas a number of measures have been developed to document the prevailing moral and ethical cli- mate, with the exception of the measure designed by Victor and Cullen (1988), little attention has been given to establishing that the data col- lected by such surveys provide an accurate and reliable picture of the prevailing moral and ethical climate. As easy as it may be to adapt items from existing measures to develop a climate survey to be used in research institutions, it is incumbent upon the research community to establish the validity, reliability, and usefulness of such measures. METHODS AND MEASURES FOR ASSESSING INTEGRITY OF THE INDIVIDUAL This section provides descriptions of measures or methods used to assess aspects of the moral integrity of the individual. Included are mea- sures of general abilities that are developmental and that are linked to ethical behavior (Bebeau et al., 1999). Measures that assess aspects of the Four-Component Model of Morality of Rest (1983)3 are described and are 3See Chapter 5, Box 5-1, for an operational definition of each of the components of moral- ity.
OUTCOME MEASURES FOR ASSESSING INTEGRITY 151 classified under the following headings: ethical sensitivity, ethical reason- ing and judgment, identity formation, and ethical implementation. In most cases, the measures described are profession specific, in that the content of the measure would not be appropriate for the assessment of integrity in research. Nonetheless, the competence being assessed is an ability that is relevant to the integrity of the researcher. If the content of the test is adapted, as has been the case in many of the examples cited below, the measurement strategy should be as effective for assessments of important learning outcomes in the research setting as it has been for assessments of important learning outcomes in other professional settings. Descriptions of some assessment strategies that rely on Restâs Four- Component Model of Morality (1983) for their theoretical grounding and that seem promising for application to research ethics follow. Ethical Sensitivity Performance-based methods for assessment of ethical sensitivity were first developed in dentistry (Bebeau et al., 1985), and the most extensive work on the validity of the method has been conducted with the Dental Ethical Sensitivity Test (Forms A and B). (See Rest et al.  and Bebeau [1994, 2001] for summaries of the validation studies.) The general strate- gies for ethical sensitivity assessment have been applied in other profes- sional settings: counselor education (Brabeck and Weisgerber, 1989; Volker, 1984); computer users (Liebowitz, 1990); undergraduate educa- tion (McNeel, 1990; Mentkowski and Loacker, 1985); geriatric dentistry (Ernest, 1990); social work (Fleck-Henderson, 1995); journalism (Lind, 1997); and school personnel, including administrators, teachers, and school psychologists (Brabeck et al., 2000). An ethical sensitivity test (Bebeau and Rest, 1990; Ernest, 1990) places students in real-life situations in which they witness an interaction on either videotape or audiotape. The interaction replicates professional in- teractions and provides clues to a professional ethical dilemma. For ex- ample, the Racial Ethical Sensitivity Test (Brabeck, 1998) consists of five videotaped scenarios that portray acts of intolerance exhibited by profes- sionals in school settings. Each scenario includes from five to nine acts of racial and gender intolerance that violate one or more of the common principles specified in ethical codes of school-based professions. Distinct from the cases typically used in ethics courses, the information is not predigested or interpreted. At a point in the presentation, the student is asked to take on the role of the professional in the situation and respond (on an audiotape) as though he or she were that person. Following his or her response to a patient, client, or colleague, the student answers a num- ber of probe questions that ask why he or she said what was said; how he
152 APPENDIX B or she expects the patient, client, or colleague to respond; what he or she thinks should be done in like situations, and so on. Using established (by an interdisciplinary team that includes practitioners) and well-validated criteria, judges rate the extent to which the student adequately interprets the significant issues and professional responsibilities presented in the situation. Studies assessing the ethical sensitivities of both professionals in train- ing and professionals in practice (Bebeau, 2001; Bebeau and Brabeck, 1987; Bebeau et al., 1985; Fleck-Henderson, 1995) indicate considerable variabil- ity among professionals in terms of their sensitivities to the ethical issues they may encounter. Thus, completion of professional training does not ensure development of sensitivity to professional issues. Studies also show, however, that ethical sensitivity can be improved with instruction (Bebeau and Brabeck, 1987; Leibowitz, 1990; Mentkowski and Loacker, 1985; Sirin et al., submitted for publication). Furthermore, studies show that ethical sensitivity is distinct from the ability to reason (the second component of the Four-Component Model of Morality of Rest ) about what ought to be done in a situation (Bebeau and Brabeck, 1987; Bebeau et al., 1985; Brabeck et al., 2000). Consequently, one cannot assume that education that focuses on ethical reasoning will transfer the ethical reasoning ability to the interpretive process. Because the assessment process is relatively expensive, requiring tran- scription of a semi structured interview and scoring by trained raters, measures of ethical sensitivity have typically been used in research stud- ies. Recently, however, Brabeck and Sirin (2001) produced a computer- ized version of the Racial Ethical Sensitivity Test (REST-CD), intended to make their test more efficient. A subsequent study (Sirin et al., submitted for publication) concluded that the more efficient assessment process pro- vides a reliable and valid measure of ethical sensitivity to instances of racial and gender intolerance. The modified ethical sensitivity assessment strategy of Brabeck and colleagues seems ideal for assessment of sensitivity to the cultural, inter- personal, and value conflicts that arise between parties (e.g., mentors and students, collaborators, or administrators and researchers) in the research setting. Notice, however, that in addition to assessing the professionalâs attention to behaviors of the person, the cases assess knowledge of the rules, regulations, and codes of ethics in the context in which they are used. Tests that assess the application of knowledge in context usually provide better assurances of knowledge acquisition. The cases developed by ethical sensitivity researchers are not unlike the dialogue cases.
OUTCOME MEASURES FOR ASSESSING INTEGRITY 153 Ethical Reasoning and Judgment Assessing Written Essays Perhaps the most familiar approach to measuring ethical reasoning and judgment is the analysis of written arguments, typically conducted by faculty who teach philosophy or professional ethics (Howe, 1982). In dentistry (Bebeau, 1994) and nursing (McAlpine et al., 1997), for example, researchers have demonstrated that essays can be reliably assessed and that instruction is effective in promoting the ability to develop well-writ- ten essays that meet criteria that are specified in advance of instruction. Such methods lack practicality for the assessment of competence in rea- soning as a function of an institutionâs efforts to promote reasoning about dilemmas in integrity in research, as they are labor intensive and require considerable expertise in philosophy or ethics. However, assessment of written essays is a particularly effective way to promote learning, espe- cially if it is accompanied by clearly stated criteria, frequent opportunities for practice, and feedback (Bebeau, 1994). These methods have been applied to integrity in research with vari- ous degrees of success. For example, Stern and Elliott (1997) describe the challenges in establishing interrater reliability and the lack of a measur- able effect if the criteria used to judge moral arguments are not presented as part of the instructional program. Recognizing both the need to teach the criteria used to make judgments about the adequacy of moral argu- ments and the need to be able to reliably apply the criteria to the evalua- tion of arguments developed by students, the Poynter Center developed and validated a set of cases and criteria for the assessment of moral rea- soning in scientific research. Moral Reasoning in Scientific Research: Cases for Teaching and Assessment (Bebeau et al., 1995) is an 80-page booklet that features six one- to two- page case studies, as well as extensive information on how to use the case studies and a discussion of the theoretical underpinnings of the approach. In addition to notes that provide the instructor with guidance on leading case discussions, the booklet includes a handout for students that details the criteria used to judge the adequacies of moral arguments. As its title implies, Moral Reasoning in Scientific Research is designed to facilitate im- provements in moral reasoning skills, as well as to facilitate assessments of such improvements. Evidence of the effectiveness of the techniques for facilitating reasoning and the validity of the assessment are described and referenced in the booklet. Ken Pimple, a coauthor on the project, recently converted the booklet to PDF format and made it available via the Poynter Centerâs World Wide Web site (Bebeau et al., 1995).
154 APPENDIX B Objective Measures of Moral Reasoning and Judgment Researchers have developed objective measures of moral reasoning and judgment.4 The most widely used test that may have the most poten- tial for the assessment of institutional effectiveness in research settings, the Defining Issues Test (DIT) (Rest, 1979; Rest et al., 1999a), has a long validation history and is a well-established measure of student learning outcomes. A large body of literature (Mentkowski, 2000; Pascarella and Terenzini, 1991) has addressed the influence of institutions of higher edu- cation on the development of critical thinking, moral development, iden- tity formation, and so on. Of all of the measures that have been designed to show the impact of higher education on important learning outcomes, DIT stands out as one of the best indicators of learning outcomes that can be linked back to institutional effectiveness. The Defining Issues Test Developed by the late James Rest (Rest, 1979; Rest et al., 1997, 1999a), DIT is a paper-and-pencil measure of moral judg- ment based on Lawrence Kohlbergâs (1984) pioneering work on the devel- opment of moral judgment over the life span. DIT measures the reasoning strategies (moral schemas) that an individual uses when confronted with complex moral problems and the consistency between reasoning and judgment. The test presents six moral dilemmas that cannot be fairly resolved by applying existing norms, rules, or laws. Respondents rate and rank arguments (12 for each problem) that they considered important in coming to a decision about what they would do. The arguments reflect the conceptually distinct reasoning strategies (schemas) that people use to justify their actions. The scores reflect the proportion of times that a per- son prefers each strategy. The most widely used score, the P Index (where P is for postconventional thinking), describes the proportion of times that a respondent selects arguments that appeal to moral ideals. Research in- dicates that mature thinkers appeal to moral ideals much more frequently than immature thinkers do. Mature thinkers (e.g., ethicists and thoughtful professionals) attempt to work out what ought to be done in circum- stances in which there is a conflict of rights, interests, or obligations. They make modifications to existing rules, laws, or codes of ethics to accommo- date the new moral problem that has arisen. Because professionals are often required to apply ethical principles or ideals to new problems that emerge in their professions, this skill is necessary for effective moral func- 4In addition to the Moral Judgment Interview of Colby and colleagues (1987), Gibbs and colleagues (1992) designed the Sociomoral Reflection Measure, suitable for the assessment of reasoning for children and adolescents, and Lind and Wakenhut (1985) designed the Moral Judgment Test, which has mainly been used in Germany. Each of these measures has been validated and has advantages.
OUTCOME MEASURES FOR ASSESSING INTEGRITY 155 tioning. Research indicates that there is a strong relationship between the P Index and prosocial moral action. In addition to the P Index, the test also determines the proportion of times that an individual selects arguments based on two other problem- solving strategies: the PI Index (where PI represents personal interests) describes the proportion of times that a respondent selects arguments that appeal to personal interests and loyalty to friends and family, even when doing so compromises the interests of persons outside oneâs immediate circle of friends, and the MN Index (where MN represents maintaining norms) describes the proportion of times that a respondent selects argu- ments that appeal to the maintenance of law and order, irrespective of whether applying the law to the dilemma presented results in an injus- tice. In addition to the three main indices, the program calculates two information-processing indices: the U Index (where U represents utilizer), whose score ranges from â1.0 to +1.0 and which describes the degree of consistency between reasoning and judgment (persons whose reasoning and judgments are reasonably consistent achieve scores of 0.4 or above), and the N2 Index, which takes into account how well the respondent discriminates among the various arguments and which is often a better indicator of change than the P Index. If the N2 Index score is higher than the P Index score, it indicates that the respondent is better able to dis- criminate among arguments than to recognize postconventional argu- ments. The validity of DIT has been assessed in terms of seven criteria (Rest et al., 1999a): 1. Differentiation of various age and education groups. Studies show that 30 to 50 percent of the variance of DIT scores is attributable to level of education. 2. Longitudinal gains. A 10-year longitudinal study of men and women, college attendees, and subjects not in college and from diverse walks of life show gains in DIT scores over the 10-year period; a review of a dozen studies of first-year to senior college students (N > 500) show effect sizes of 0.80, making gains in DIT scores one of the most dramatic effects of college. 3. Relation to cognitive capacity measures. DIT is significantly re- lated to cognitive capacity measures of moral comprehension (r = 0.60s), recall and reconstruction of postconventional moral arguments, Kohl- bergâs (1984) interview measure, and (to a lesser degree) other measures of cognitive development. 4. Sensitivity to moral education interventions. DIT is sensitive to moral education interventions. One review of more than 50 intervention studies reports an effect size for dilemma discussion interventions of 0.40
156 APPENDIX B (moderate gains), whereas the effect size for comparison groups was only 0.09 (little gain). 5. Linkage to many prosocial behaviors and to desired professional decision making. DIT is significantly linked to many prosocial behaviors and to desired professional decision making. One review reports that the links for 37 of 47 measures were statistically significant. 6. Linkage to political attitudes and political choices. DIT is signifi- cantly linked to political attitudes and political choices. In a review of several dozen correlates of political attitude, DIT typically correlates with r values in the range of 0.40 to 0.60. When coupled with measures of cultural ideology, the combination predicts up to two-thirds of the vari- ance of controversial public policy issues (such as abortion, religion in public schools, the roles of women, the rights of accused individuals, the rights of homosexuals, and free speech issues). 7. Reliability is good. The Cronbach alpha value5 is in the upper 0.70s to low 0.80s. The test-retest reliability of DIT is stable. Furthermore, DIT shows discriminant validity from verbal ability- general intelligence and from conservative-liberal political attitudes; that is, the information in a DIT score predicts the seven validity criteria above and beyond that accounted for by verbal ability or political attitude. DIT is equally valid for males and females. DIT-2 (Rest et al., 1999b) is an updated version of the original DIT (DIT-1) devised 25 years ago. Compared with DIT-1, DIT-2 not only has stories that are not dated but is also a shorter test, has clearer instructions, and retains more subjects through subject reliability checks. In addition, in studies conducted so far, the validity of the test is not sacrificed be- cause it is a shorter test. If anything, it improves on validity. The correla- tion of the results of DIT-1 with those of DIT-2 is 0.78, approaching the test-retest reliability of DIT-1 with itself. Using DIT to Assess Educational Effects Because DIT has been used to assess the effects of interventions in professional ethics and research eth- ics (Heitman et al., 2000), a brief summary of findings is included here. 5Cronbach alpha (Cronbach, 1951) provides an estimate of the internal consistency of the test. Because ranking data are used to calculate the P index and the N2 index, the individual items would not be the appropriate unit of analysis for determining internal consistency reliability. Further, ranking data are ipsative; that is, if one item is ranked in first place, then no other item can be ranked in first place. Therefore, the unit of internal reliability is on the story level, not the item level, and Cronbach alpha is the appropriate strategy for estimating internal consistency. Calculated across six stories for DIT1, the estimates are 0.76, for the five story DIT2 0.81, which is somewhat lower than the estimate of 0.90 if calculated across all 11 stories for the two forms of the test (Rest et al., 1997).
OUTCOME MEASURES FOR ASSESSING INTEGRITY 157 Typically, researchers have reported scores in terms of the P Index score (the proportion of items selected that appeal to postconventional moral frameworks for decision making). The average adult selects postconven- tional moral arguments about 40 percent of the time, the average Ph.D. candidate in moral philosophy or political science does so about 65.2 percent of the time, the average graduate student does so 53.5 percent of the time, the average college graduate does so 42 percent of the time, and the average high school student does so 31.8 percent of the time (Rest et al., 1999b). Similar to college graduates, Heitman and colleaguesâ (2000) sample of 280 graduate students in a research ethics course achieved a mean score of 43.9 (standard deviation [SD], 13.1). In contrast, a sample of 14 scien- tists (from a variety of disciplines) who completed DIT while in atten- dance at a summer institute on the teaching of research ethics achieved a mean score of 53 (SD, 13), comparable to the mean and variance for gradu- ate students. What is important about this data set is that the variability among those interested in teaching research ethics is comparable to the variability observed among students and professionals like physicians and dentists. In other words, one cannot assume the development of post- conventional thinking on the basis of oneâs achievement as a scientist. Furthermore, a recent analysis of DIT profiles for entering profes- sional students (i.e., the proportion of arguments selected with a personal interest, maintaining norms, and postconventional moral framework) in- dicates that fully 47 percent of a sample of 222 first-year students were in a âtransitional statusâ of developmental change in their mode of thinking (Bebeau, 2001). In other words, their DIT profiles indicated that they were not distinguishing less adequate from more adequate moral arguments as well as students who had completed their ethics program were. As a consequence of this recent observation and a recent meta-analysis of the effects of interventions on moral judgment development (Yeap, 1999), Bebeau (2001) recommends that researchers studying the effects on an intervention conduct a profile analysis rather than rely only on the P Index as a measure of change. Whereas progress in moral judgment is developmental and develop- ment proceeds as long as an individual is in an environment that stimu- lates moral thinking, gains in moral judgment are typically not found to be associated with professional education programs (e.g., veterinary medi- cine, medicine, dentistry, and accounting programs) unless the program has a specially designed ethics curriculum (Rest and NarvÃ¡ez, 1994). Fur- thermore, for some students (Bebeau and Thoma, 1994) and some profes- sions (Ponemon and Gabhart, 1994), educational programs actually seem to inhibit growth in terms of gaining moral judgment. For example, Ponemon and Gabhart speculate that the heavy emphasis placed on learn-
158 APPENDIX B ing and applying regulatory codes in the education of accountants may inadvertently promote a maintaining norms moral framework that inhib- its the development of the advanced moral frameworks needed to reason through new moral issues. Such findings reinforce the importance of the use of outcome measures to assess institutional effectiveness in promot- ing the development of reasoning ability. Development of a Prototype Intermediate Concept Measure Tests like DIT are valuable for assessment of a general reasoning ability that is a critical element of professional ethical development, but they may not be sensitive to the specific concepts taught in a professional ethics courseâ or, indeed, in a research ethics course. Referring to teacher education, Strike points out: âIt is no doubt desirable that teachers acquire sophisti- cated and abstract principles of moral reasoning [as measured by DIT]. . . . But a teacher who has a good grasp of abstract moral principles may nevertheless lack an adequate grasp of specific moral concepts, such as due processâ (Strike, 1982, p. 213). The question (for educators) is often whether to teach specifically to the codes or policy manuals or to teach concepts particular to a discipline: informed consent, intellectual prop- erty, conflict of interest, and so on. Strike (1982) refers to such profession- specific concepts as âintermediate-level ethical concepts,â as they lie in an intermediate zone between the more general principles (e.g., autonomy, justice, and beneficence) described by philosophers and the more pre- scriptive directives often included in codes of conduct. To test the possibility of designing a profession-specific test of ethical reasoning that could be used to assess the acquisition of intermediate concepts taught in a curriculum and that could be used to study the relationship between abstract reasoning and competence to reason about new professional problems, Bebeau and Thoma (1999) designed and tested the The Dental Ethical Reasoning and Judgment Test (DERJT). Simi- lar to DIT, the test consists of five ethical problems in dentistry to which the respondent provides action choices and justification choices. The ac- tion and justification choices for each problem were generated by a group of dental faculty and residents. The scoring key reflects consensus among a national sample of 14 dental ethicists as to better, worst, and neutral choices and justification but does not prescribe a single best action or justification. When taking the test, a respondent rates each action or justification and then selects the two best and the two worst action choices and the three best and the two worst justifications. Scores are determined by cal- culating the proportion of times that a respondent selects action choices and justifications consistent with âexpert judgment.â High levels of agree- ment among 14 dental ethicists as to better and worse action choices (88
OUTCOME MEASURES FOR ASSESSING INTEGRITY 159 percent agreement for appropriate and inappropriate actions respectively and 95 and 93 percent agreement for appropriate and inappropriate justi- fications, respectively) demonstrated the validity of the construct. Bebeau and Thoma (1999) reported effect sizes of 0.93 and 0.56 for action and justification choices, respectively, between first-year college students and first-year dental school students, and effect sizes of 0.85 and 0.56, respec- tively, between first-year dental school students and dental school seniors in the class of 1997. Additionally, in a recent study of 308 graduates who completed DERJT and DIT, Bebeau and Thoma (2000) report that scores on DERJT are related to those on DIT (r = 0.22) but that the two tests are not a redundant source of information about competence in ethical reasoning and judgment. In addition, the results indicated that students with a good grasp of abstract moral schemas (good DIT P Index scores) were better able to solve the novel ethical problems presented on DERJT. As with other measures of ethical development, scores on DERJT were not related to a studentâs grade point average. Identity Formation and Role Concept Development One of the chief objectives of the study described in On Being a Scien- tist (NAS, 1989, 1995) was to convey the central values of the scientific enterprise. In an earlier era, such values were typically conveyed infor- mally, through mentors and research advisers. Today, educators recog- nize the need to introduce the responsibilities more formally. Anderson (2001), in her study of doctoral studentsâ conceptions of science and its norms, concludes that students might not be subject to as much group socialization through osmosis as many faculty assume. Nonetheless, the means by which socialization to the normative aspects of academic life are communicated are primarily informal (Anderson, 2001). In addition to providing support for the need to more deliberately socialize students to the norms of the research enterprise, Andersonâs study will likely provide grist for the design or modification of items used to assess role concept development for researchers. Such measures have been developed in some professions to assess identity formation and its relationship to ethical action. Professional Role Orientation Inventory The Professional Role Orientation Inventory (PROI) (Bebeau et al., 1993; Thoma et al., 1998) consists of four 10-item likert scales that assess commitment to privilege professional values over personal values. Two of the scales assess dimensions of professionalism that are theoretically
160 APPENDIX B linked to models of professionalism described in the professional ethics literature (e.g., Emanuel and Emanuel, 1992; May, 1983; Ozar, 1985; Veatch, 1986). The PROI scalesâin particular, the responsibility and au- thority scalesâhave been shown to consistently differentiate beginning and advanced student groups and practitioner groups, who are expected to differ in their role concepts. By plotting the responses of a cohort on a two-dimensional grid (Bebeau et al., 1993), it is possible to observe four distinctly different views of professionalism that, if applied, would favor different decisions about the extent of responsibility to others. In comparing practicing dentists with entering students and gradu- ates, Minnesota graduates consistently express a significantly greater sense of responsibility to others than entering students and practicing dentists from the region. This finding has been replicated for five cohorts of graduates (n = 379). Additionally, the mean score for the graduates was not significantly different from that for a group of 48 dentists, who dem- onstrated a special commitment to professionalism by volunteering to participate in a national seminar to train individuals to be leaders of ethics seminars. A recent comparison of pretest and posttest scores for students in the classes of 1997 to 1999 (Bebeau, 2001) indicates a signifi- cant change (p < 0.0001) from the pretest to the posttest scores. Cross- sectional studies of differences between pretest and posttest scores for students in a comparable dental program suggest that instruction in eth- ics accounts for the change. The most direct evidence of a relationship between role concept and professionalism comes from the study of the performances of 28 members of the practicing community referred for courses in dental ethics because of violations of the Dental Practice Act. Although the practitioners varied considerably on measures of ethical sensitivity, reasoning, and ethical implementation, 27 of the 28 individuals were unable to clearly articulate role expectations for a professional (Bebeau, 1994). (See Bebeau et al.  for a more extensive description of the theoretical grounding for this measure.) Professional Decisions and Values Test Rezler and colleagues (1992) designed the Professional Decisions and Values Test for lawyers and physicians to assess action tendencies and the underlying values in situations with ethical problems. Patterned after DIT and the Medical Ethics Inventory, the test consists of 10 case vignettes, to which respondents provide three alternative actions and seven reasons to explain the action chosen. Actions are arranged from the least to the most intrusive, and the reasons represent one of seven values commonly used to resolve an ethical dilemma. The cases were selected to represent three
OUTCOME MEASURES FOR ASSESSING INTEGRITY 161 themes: (1) obligation to the patient versus obligation to society, (2) re- spect for client autonomy versus professional responsibility, and (3) pro- tection of the patientâs interest versus respect for authority. In the presen- tation of the findings, data for two consecutive classes of entering medical and law students (n = 340) are presented, as are their action choices, and the values are compared. Although the findings support the construct validity of the test, test-retest reliability is stable over time for action choices but not for values. The developers hypothesize that values do not become stable until later in the curriculum; thus, the test may be more useful for the assessment of change over time than for the tracking of changes for individuals. Differences by sex and profession were observed when the measure was used. Whether the lack of stability in the retest reliability study can be attributed to changes that are influenced by the curriculum is a question worthy of further study. Although further validation work needs to be done with this measure, the test is cited because its format shows promise for the design of a measure of role concept. Ethical Implementation In terms of the implementation of programs on professional ethics, Braxton and Baird (2001) point to the importance of providing prepara- tion for professional self-regulation, and Fischer and Zigmond (2001) stress the importance of a variety of skills relevant to professional prac- tice. To date, objective measures have not been devised to measure com- petence in the implementation of effective action plans. Although there may be some generic abilities, like problem-solving abilities and abilities in interpersonal and written communication, that could be assessed by the use of objective tests, it is hard to imagine designing anything but performance-based assessments of the broad range of skills required for effective, responsible research practice. Instructional programs could con- sider collecting examples of professional performance for evaluations by faculty and students, similar to the portfolios that Gilmer (1995) has stu- dents develop for her courses in research ethics. Also, institutions could draw attention to the importance of integrity in the conduct of science by including questions derived from the definition of integrity in regular faculty evaluations of research competence, including evaluations used to make promotion and tenure decisions. SUMMARY AND CONCLUSION A considerable amount of work has been done on the development of measurements of ethical integrity that has relevance for research institu-
162 APPENDIX B tions concerned with the assessment of integrity in the research environ- ment. This appendix has described outcome measures and models for the development of outcome measures that address two specific purposes. The first is to assess the ethical and moral culture and climate of an insti- tution to ensure that the climate, which includes policies and procedures related to the ethical conduct of research, supports the individual re- searcherâs ability to function at the leading edge of professional integrity. Research in organizational behavior indicates that the ethical and moral climate of an institution can either inhibit or promote the responsible conduct of research. The second purpose is to describe measures and methods developed in other settings of education in professional ethics that could be used directly or that could be adapted for use in the assessment of the effective- ness of courses on the responsible conduct of research or the effectiveness of an institutionâs efforts to promote integrity in research. The following criteria were used for the selection of measures for the latter category: the measures had to be theoretically grounded in a well-validated psycho- logical theory of morality, were at least indirect measures of behavior, and either had been effectively used or have good potential to link the development of aspects of integrity (e.g., ethical sensitivity, moral reason- ing and judgment, and identity formation) to institutional effectiveness. In the case of methods and measures that an institution might use in a self-assessment of its moral climate, none that are directly applicable to the research setting have been developed. On the other hand, by modify- ing the content of the process for assessment of an institutionâs moral climate and the survey items used to collect information on the percep- tions of individuals who work in that climate, it should be possible for an institution to gather information that would enable it to conduct an effec- tive self-study. A reviewer of the section on the assessment of an institu- tionâs moral climate will notice that data on the psychometric properties of the surveys developed for climate assessment are not readily available for the examples described here. Given such data, it would be necessary not only to modify the content of such a survey but also to conduct appro- priate validation studies. In the case of measures for the assessment of outcomes of instruction in the responsible conduct of research, with the exception of DIT (a well- validated test of moral development over the life span that has been used effectively in intervention studies and in institutional outcome studies) the content of measures would need to be adapted. Several models for measurement have been sufficiently tested in the context of a professional ethics education program to warrant their application to the setting of integrity in research. Chapter 5 of this report gives considerable attention to teaching the responsible conduct of research. Far less attention, how-
OUTCOME MEASURES FOR ASSESSING INTEGRITY 163 ever, has been given to assessments of learning. One reason is the lack of well-validated outcome measures that can be used to assess the effects of instruction on the responsible conduct of research. Because individual teachers and even individual institutions are unlikely to be able to mount the kind of research and development plan needed to design and validate measures that assess the important outcomes of education in the respon- sible conduct of research, a national effort is needed. The design of such measures should be grounded in a well-established theory of ethical de- velopment and should be sufficiently user friendly to enable their use for a variety of purposes. Such purposes may include the following: (1) deter- mining the range of criteria that define competence in ethical behavior in various disciplines; (2) conducting a needs assessment to identify areas where instructional resources should be placed; (3) identifying individual differences or problems that require intervention or remediation; (4) pro- viding feedback to individuals, departments, and institutions on compe- tence in research ethics; (5) determining the effects of current programs; (6) certifying research competence in ethical behavior; and (7) studying the relationship between competence and ethical behavior. Given the paucity of suitable methods for the assessment of integrity in the research environment and the skepticism that education in the responsible conduct of research can make a measurable difference in im- portant abilities related to the responsible conduct of research, there ap- pears to be a clear need for work on the development of measurements that would serve the research community. There is also a need to design, modify, or adapt methods and survey measures to evaluate the culture and climate that promotes integrity in research. REFERENCES Anderson M. 2001. What Would Get You in Trouble: Doctoral Studentsâ Conceptions of Science and Its Norms. Proceedings of the ORI Conference on Research on Research Integrity. [Online]. Available: http://www-personal.umich.edu/~nsteneck/rcri/index.html [Accessed March 13, 2002 ]. Bebeau MJ. 1994. Influencing the moral dimensions of dental practice. In: Moral Develop- ment in the Professions: Psychology and Applied Ethics. Hillsdale, NJ: L. Erlbaum Associ- ates. Pp. 121â146. Bebeau MJ. 2001. Influencing the Moral Dimensions of Professional Practice: Implications for Teaching and Assessing for Research Integrity. Proceedings of the ORI Conference on Research on Research Integrity. [Online]. Available: http://www-personal.umich. edu/~nsteneck/rcri/index.html [Accessed March 13, 2001]. Bebeau MJ, Brabeck MM. 1987. Integrating care and justice issues in professional moral education: A gender perspective Journal of Moral Education 16:189â203. Bebeau MJ, Davis EL. 1996. Survey of ethical issues in dental research. Journal of Dental Research 75:845â855. Bebeau MJ, Rest JR. 1990. The Dental Ethical Sensitivity Test. Minneapolis, MN: Division of Health Ecology, School of Dentistry, University of Minnesota.
164 APPENDIX B Bebeau MJ, Thoma SJ. 1994. The impact of a dental ethics curriculum on moral reasoning. Journal of Dental Education 58:684â692. Bebeau MJ, Thoma SJ. 1999. âIntermediateâ concepts and the connection to moral educa- tion. Educational Psychology Review 11:343â360. Bebeau MJ, Thoma SJ. 2000 (July 8). The Validity and Reliability of an Intermediate Ethical Concepts Measure. Paper presented at the annual meeting of the Association for Moral Education. Glasgow, Scotland. Bebeau J, Rest JR, Yamoor CM. 1985. Measuring dental studentsâ ethical sensitivity. Journal of Dental Education 49:225â235. Bebeau MJ, Born DO, Ozar DT. 1993. The development of a Professional Role Orientation Inventory. Journal of the American College of Dentists 60(2):27â33. Bebeau MJ, Pimple KD, Muskavitch KMT, Borden SL, Smith DL. 1995. Moral Reasoning in Scientific Research: Cases for Teaching and Assessment. Bloomington, IN: Indiana Univer- sity. [Online]. Available: http://www.indiana.edu/~poynter/mr-main.html [Accessed March 15, 2002]. Bebeau MJ, Rest JR, Narvaez DF. 1999. Beyond the promise: A perspective for research in moral education. Educational Researcher 28(4):18-26. Bowen MG, Power CP. 1993. The moral manager: Communicative ethics and the Exxon Valdez disaster. Business Ethics Quarterly 3:97â115. Brabeck MM. 1998. Racial ethical sensitivity test: REST videotapes. Chestnut Hill, MA: Lynch School, Boston College. Brabeck MM, Sirin S. 2001. The Racial Ethical Sensitivity Test: Computer Disk Version (REST- CD). Chestnut Hill, MA: Lynch School, Boston College. Brabeck MM, Rogers LA, Sirin S, Henderson J, Benvenuto M, Weaver M, Ting K. 2000. Increasing ethical sensitivity to racial and gender intolerance in schools: Development of the racial ethical sensitivity test. Ethics & Behavior 10:119â137. Brabeck MM, Weisgerber K. 1989. Responses to the Challenger tragedy: Subtle and signifi- cant gender differences. Sex Roles 19:639â650. Braxton J, Baird L. 2001. Preparation for professional self regulation. Science and Engineering Ethics 7:593â614. Burnett D, Rudolph L, Clifford K., eds. 1998. Academic Integrity Matters. Washington, DC: National Association of Student Personnel Administrators, Inc. Colby A, Kohlberg L, Speicher B, Hewer A, Candee D, Gibbs J, Power C. 1987. The Measure- ment of Moral Judgment, Vols. 1 and 2. New York, NY: Cambridge University Press. Cronbach LJ. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16:297â 334. Cullen J, Victor B, Stephens C. 1989. An ethical weather report: Assessing the organizationâs ethical climate. Organizational Dynamics 18:50â62. Cullen JB, Victor B, Bronson JW. 1993. The ethical climate questionnaire: An assessment of its development and validity. Psychological Reports 73:667â674. Emanuel E, Emanuel L. 1992. Four models of the physician-patient relationship. Journal of the American Medical Association 267:2221â2226. Ernest M. 1990. Developing and Testing Cases and Scoring Criteria for Assessing Geriatric Dental Ethical Sensitivity. M.S. thesis. University of Minnesota, Minneapolis. Fischer BA, Zigmond MJ. 2001. Promoting responsible conduct in research through âsur- vival skillsâ workshops: Some mentoring is best done in a crowd. Science and Engineer- ing Ethics 7:563â587. Fleck-Henderson A. 1995. Ethical Sensitivity: A Theoretical and Empirical Study. Doctoral dis- sertation. The Fielding Institute, Santa Barbara, California. Gibbs JC, Basinger KS, Fuller D. 1992. Moral Maturity: Measuring the Development of Sociomoral Reflection. Hillsdale, NJ: Erlbaum Associates.
OUTCOME MEASURES FOR ASSESSING INTEGRITY 165 Gilmer PJ. 1995. Teaching science at the university level: What about the ethics? Science and Engineering Ethics 1:173â180. Heitman E, Salis, P, Bulger, RE 2000. Teaching ethics in biomedical sciences: Effects on moral reasoning skills. Paper presented at the ORI Research Conference on Research Integrity, Washington, D.C., November 2000 [Online]. Available http://ori.dhhs.gov/ multimedia/acrobat/papers/heitman.pdf [Accessed March 15, 2002]. Higgins A, Power C, Kohlberg L. 1984. The relationship of moral atmosphere to judgments of responsibility. In: Kurtines WM, Gewirtz JL, eds., Morality, Moral Behavior, and Moral Development. New York, NY: Wiley. Pp. 74â108. Howe K. 1982. Evaluating philosophy teaching: Assessing student mastery of philosophical objectives in nursing ethics. Teaching Philosophy 5(1):11â22. Kohlberg L. 1984. The Psychology of Moral Development: The Nature and Validity of Moral Stages. Essays on Moral Development Vol. 2. San Francisco: Harper & Row. Korenman SG, Berk R, Wenger NS, Lew V. 1998. Evaluation of the research norms of scien- tists and administrators responsible for academic research integrity. Journal of the American Medical Association 279:41â47. Leibowitz S. 1990. Measuring Change in Sensitivity to Ethical Issues in Computer Use. Doctoral dissertation. Boston College, Boston, MA. Lind R. 1997. Ethical sensitivity in viewer evaluations of a TV news investigative report. Human Communication Research 23:535â561. Lind G, Wakenhut R. 1985. Testing for moral judgment competence. In: Lind G, Hartmann HA, Wakenhut R, eds. Moral Development and the Social Environment. Chicago, IL: Pre- cedent. Pp. 79â105. May WE. 1983. The Physicianâs Covenant: Images of the Healer in Medical Ethics. Philadelphia: Westminster Press. McAlpine H, Kristjanson L, Poroch D. 1997. Development and testing of the ethical reason- ing tool (ERT): An instrument to measure the ethical reasoning of nurses. Journal of Advanced Nursing 25:1151â1161. McNeel SP 1990. Development of a measure of moral sensitivity for college students. In: Teaching Values Across the Curriculum. Project report. Dunbarton, NH: The Christian College Consortium. Mentkowski M. 2000. Learning That Lasts: Integrating Learning, Development, and Performance in College and Beyond. San Francisco, CA: Jossey-Bass. Mentkowski M, Loacker G. 1985. Assessing and validating the outcomes of college. In Ewell PT, ed. Assessing Educational Outcomes. New Directions for Institutional Research. No. 47. San Francisco: Jossey-Bass. Pp. 47â64. NAS (National Academy of Sciences). 1989. On Being a Scientist. Washington, DC: National Academy Press. NAS. 1995. On Being a Scientist, 2nd ed. Washington, DC: National Academy Press. OGE (U.S. Office of Government Ethics). 2000. Executive Branch Employee Ethics Survey 2000. [Online]. Available http://www.usoge.gov/pages/forms_pubs_otherdocs/ fpo_files/surveys_ques/srvyemp_if_00.pdf [Accessed March 15, 2002]. Ozar DT. 1985. Three models of professionalism and professional obligation in dentistry. Journal of the American Dental Association 110:173â177. Pascarella ET, Terenzini PT. 1991. Moral development. In: How College Affects Students: Find- ings and Insights from Twenty Years of Research, San Francisco, CA: Jossey-Bass. Pp. 335â 368. Ponemon, LA, Gabhart, DRL. 1994. Ethical reasoning research in the accounting and audit- ing professions. Rest JR, Narvaez D, ed. Moral development in the professions: Psychology and applied ethics. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Pp. 101â119.
166 APPENDIX B Power C. 1980. Evaluating just communities: Toward a method of assessing the moral at- mosphere of the school. In: Moser R, ed. Moral Education: A First Generation of Research and Development. New York, NY: Praeger. Pp. 223â265. Power C. Higgins A Kohlberg L. 1989. Lawrence Kohlbergâs Approach to Moral Education. New York, NY: Columbia University Press. Rest J. 1983. Morality. In: Mussen PH (series ed.) and Flavell J, Markman E (vol. eds.). Handbook of Child Psychology, Vol. 3, Cognitive Development, 4th ed. New York, NY: Wiley. Pp. 556â629. Rest J, NarvÃ¡ez D, Bebeau MJ, Thoma SJ. 1999a. Postconventional Moral Thinking: A Neo- Kohlbergian Approach. Hillsdale, NJ: L. Erlbaum Associates. Rest J, NarvÃ¡ez D, Thoma SJ, Bebeau MJ. 1999b. DIT2: Devising and testing a revised instru- ment of moral judgment. Journal of Educational Psychology 91(4):644â659. Rest JR. 1979. Development in Judging Moral Issues. Minneapolis: University of Minnesota Press. Rest JR, NarvÃ¡ez DF, eds. 1994. Moral Development in the Professions: Psychology and Applied Ethics. Hillsdale, NJ: Erlbaum Associates. Pp. 51â70. Rest J, Thoma SJ, NarvÃ¡ez D, Bebeau MJ. 1997. Alchemy and beyond: Indexing the Defining Issues Test. Journal of Educational Psychology 89(3):498â507. Rest JR, Bebeau MJ, Volker J. 1986. An overview of the psychology of morality. In: Rest JR, eds. Moral Development: Advances in Research and Theory, Boston, MA: Prager Publish- ers. Pp. 1-39. Rezler AG, Schwartz RL, Obenshain SS, Lambert P, McGibson J, Bennahum DA. 1992. Assessment of ethical decisions and values. Medical Education 26:7â16. Sirin S, Brabeck MM, Satiani A, Rogers LA. Submitted for publication. Development of computerized racial ethical sensitivity test. Stern J, Elliott D. 1997. The Ethics of Scientific Research: A Guidebook for Course Development. Hanover, NH: University Press of New England. Strike KA. 1982. Educational Policy and the Just Society. Chicago, IL: University of Chicago. Thoma SJ, Bebeau MJ, Born DO. 1998. Further analysis of the Professional Role Orientation Inventory. Journal of Dental Research 77(Special Issue):120 (abstract 116). U.S. Army. 2001. Ethical Climate Assessment Survey. Document GTA 22-6-1. [Online]. Avail- able: http://www.leadership.army.mil/leaderphilosophyandvision/ECAS.htm [Ac- cessed June 20, 2001]. Veatch RM. 1986. Models for ethical medicine in a revolutionary age. In: Mappes TA, Zembaty J, eds. Biomedical Ethics, 2nd ed. New York:McGraw-Hill. Victor B., Cullen JB. 1988. The organizational bases of ethical work climates. Administrative Science Quarterly 33:101â125. Volker JM. 1984. Counseling Experience, Moral Judgement, Awareness of Consequences, and Moral Sensitivity in Counseling Practice. Doctoral thesis. University of Minnesota, Minneapo- lis, MN. Yeap CH. 1999. An Analysis of the Effects of Moral Education Interventions on the Development of Moral Cognition. Doctoral dissertation. University of Minnesota, Minneapolis, MN.