Performance Evaluation and Licensing Assessment
UNDERSTANDING PERFORMANCE EVALUATION ASSESSMENT
Mariner Competence and Proficiency
Knowing what constitutes training, competence, and proficiency is important to understanding the nature and role of mariner performance evaluation and assessment. Training is the systematic development of attitudes, knowledge, and skills required by an individual or a team to perform a given task appropriately. Competence is having adequate knowledge or skills to perform occupational activities, to establish employment or licensing authority standards, as defined by performance criteria. Proficiency is demonstrated ability.
The difference between competence and proficiency is illustrated by the traditional mariner licensing process. Merchant marine officers are tested by a written examination during licensing. This method may demonstrate a level of knowledge, but does not demonstrate sustained ability to perform the task or the job. Simulators may provide a practical method for measuring or testing levels of competence and proficiency and the ability to continue to prioritize tasks. Yet few training courses described for the committee rigorously addressed measuring, evaluating, and assessing individual mariner performance.
Measuring Human Performance
Measuring human performance, either simulator-based or real-performance-based, is often difficult. The visible behavior is only part of the performance:
perception, cognition, physiology, and psychology are buried within the individual. Team performance is even harder to measure. With team performance a number of individuals interact, and the interactions need to accommodate authority levels, role and responsibility levels, verbal and nonverbal communications, and power levels.
Human Factors Aspects of Simulation (NRC, 1985) contains comments and recommendations with respect to performance measurement. The study concluded that the development and application of performance measurement in simulator-based training and research requires that the following elements be considered:
- Operational measures and criteria of overall system effectiveness for representative tasks and operating environments are needed.
- Analysis of the hierarchy of goals and control strategies that operators employ in the performance of real-world tasks is needed.
- Measurement for performance diagnosis needs to be developed.
- Team performance measurement where the contribution of each person is to be defined and measured is difficult.
- Automated instrumentation and performance measurement systems are not possible in all simulations, or at least for certain tasks.
- There is no single source, or even a coherent body of literature, to which practitioners can turn to obtain useful data on performance measurement methods and simulation practice.
Performance measurement must consider three levels of relevance: concept, understanding, and performance. Concept refers to the knowledge base of the individual, the degree to which the correct analysis and response methods are known. Understanding refers to the degree to which the individual is able to accommodate and work around missing or inaccurate information to correctly analyze and respond. Performance refers to the physical acts that are observable by instructing or evaluating personnel. Performance observation is clouded further when tasks demand multiple and simultaneous responses, are complex, or are to be measured at points of significant stress (which may include extremis or near-extremis exercises).
Evaluation and Assessment Defined
In some fields the terms evaluation and assessment can be used interchangeably. For this study, the terms are given more narrow definitions. Evaluation is applied to the formal or informal review of training exercise results: the input is the training program; the output is the evaluation. In this context, evaluation is an element of the instructional design process. The evaluation can be informal or formal, subjective or objective, or both (see ''Forms of Evaluation and Assessment" below).
Assessment is used only in the context of the licensing and certification process. Assessment is the testing of competency against specific standard criteria used for certification or licensing. The input is the formal test of competence against a set of stated, standardized criteria; the output is the assessment, either objective (e.g., multiple-choice test) or subjective (assessor completion of a simulation checklist).
This use of more-narrowly defined terminology extends to the terms instructors and evaluators in the context of training programs and assessors in the context of licensing and certification.
FORMS OF EVALUATION AND ASSESSMENT
Within the simulator environment, performance evaluations for trainees may be informal or formal, subjective or objective, or both. Performance assessments for licensing candidates conducted in a simulator environment are always formal, though they may also be objective or subjective or both.
By far the most common type of evaluation is informal. These evaluations, most of which are implicit, are routinely conducted as an integral part of simulator-based training courses. They are typically conducted on an ad hoc basis and are usually not written. The most common form of informal evaluation is the undocumented debriefing of an exercise by an instructor or instructors. These routine, ad hoc evaluations are used to adjust exercise content and timing and to guide trainees toward achieving planned learning objectives.
Instructors also evaluate the results of training to help improve course content and methodology. They evaluate each student's professional background, experience, attitude, and aptitude to select the most appropriate learning methods and measurements. The instructor may also evaluate the results of each exercise to provide expert critiques of performance activity and/or facilitate trainee exercise debriefings and conduct peer evaluations of the results.
Trainees also conduct evaluations. They continuously evaluate the degree to which a course is moving them toward meeting their personal or professional development objectives. Similarly, the sponsors of trainees make implicit evaluations on a course's value to their organizational objectives. The results of the course performance may or may not be formally recorded and retained. Generally, formal records are not retained, although there are exceptions, such as when grades need to be assigned to meet baccalaureate requirements.
Private, Informal Evaluations
Sometimes, albeit infrequently, training sponsors or pilot associations have requested that simulator facilities conduct a private evaluation of a specific
individual who is scheduled to participate in training. Generally, such requests have been borne of necessity. A company may have received indications of a performance problem and have few, if any, other practical options for determining whether there is a problem that merits corrective action, such as additional specialized training.
No data are available on the practice of private evaluations. As a rule, however, simulator facilities have been reluctant to provide private evaluations of individual performance. This reluctance is caused by concerns about possible adverse effects on the credibility of the simulator facility operator's training programs and possible liability. To the extent that private evaluations have been done, they have often taken the form of individual feedback, almost always without the preparation and retention of formal documentation.
Formal Evaluation and Assessment
Because there has been a general reluctance among operating companies, unions, and operators of marine simulator facilities to formally evaluate the knowledge or performance of licensed mariners, formal evaluations are seldom employed in marine simulation. The committee did, however, find instances where formal performance evaluations were conducted on a simulator. These cases included cadet evaluation using bridge watchkeeping courses at maritime education institutions, an offshore towing deck officer training program, an active watchstander course, and a leadership course (discontinued) sponsored by a major shipping company.
In the first case, cadets attending the U.S. Merchant Marine Academy are required to take a course in watchstanding that uses the Computer Aided Operations Research Facility ship-bridge simulator as the principal training aid. Cadets are required to attain minimum performance standards of watchkeeping, including communications, navigation, change of watch, and bridge team coordination practices. These practices are generic and applicable to all vessels, rather than to operating practices aboard a specific ship or within a specific company. All cadets are required to complete this course to graduate from the Academy. A detailed case study of the course is provided in Appendix F.
The shipping company leadership and team-building course required that each participant meet certain final performance criteria for continuing employment. Those participants who did not achieve satisfactory performance levels were permitted to participate in additional course work and practice to reach a predetermined minimal level of performance. These performance minimums were observed while the student was participating in shiphandling simulation.
In the use of formal simulator assessment for licensing, there are task-specific subjective assessments on mandatory radar observer courses and some other radar courses, such as the use of automatic radar plotting aids. The Master's Level Proficiency Course recently approved by the U.S. Coast Guard (USCG)
(see Appendix F) and offered at the STAR Center in Dania, Florida, contains both a written (objective) and an assessor-scored simulation assessment (subjective) test.
To be effective, formal evaluations or assessments must have standardized and structured monitoring and must include a critique of individual performance in a range of exercises appropriate to the instructional or licensing objectives or criteria. Formal evaluations or assessments must be consistent in method, timing, and responsibility from class to class or test to test, so that results can be compared and contrasted with a high degree of reliability.
Objective Evaluations or Assessments
An objective evaluation or assessment is not subject to evaluator or assessor bias or observational limitations. To use objective evaluation or assessment the performances must be able to be expressed in "yes" or "no" format. These evaluations could use a checklist or a simulator-embedded assessment. In an objective evaluation, the evaluator would note whether a particular practice took place. Objective evaluation may also permit the evaluator to indicate the quality of the practice. Use of the objective method generally requires that the student accept the evaluator as an equal.
Checklists and Task Lists. Checklists and task lists can be useful for measuring performance objectively. Use of such forms, however, requires that:
- the specific and detailed elements of the performance under consideration are well-known,
- these elements have been fully articulated, and
- they are accepted by all relevant groups.
As discussed in Chapter 1, there is research in the maritime industry that includes task analyses. These studies have been made widely available for some time, but their results have seen limited use and are now somewhat dated. The International Maritime Organization (IMO) also has promulgated various booklets, IMO Practical Test Standards , that provide checklists for criterion-measured evaluation. The few courses that have included formal evaluations have used checklists or task lists, in some cases combined with subjective (qualitative) evaluations of each task performed.
Simulator Embedded. The current level of simulator sophistication permits the simulator program itself to evaluate the student's performance. Based on ship type, ship loading, hydrodynamic and aerodynamic characteristics, environmental conditions, and instructor input, the simulator can evaluate the degree to which the student met the performance parameters established for the run. The evaluation can be portrayed graphically (and in color) showing own ship's track, rudder and engine commands, other vessel tracks, and the impact of environmental conditions. The
student may be given a copy of the evaluation, and a copy may be stored in the computer memory for future evaluations of other students.
The simulator's sophistication cannot hide the essential form of the evaluation: the evaluator, the computer, or some combination of the two determines what the correct performance is and what each successive student's judgments should be. Since there are many acceptable ways to perform navigational and shiphandling tasks, acceptance of such evaluation by students who are master mariners may be limited.
Performance Playback. Simulator playback capabilities can be important. The ability to make audio, video, and plotter recordings of several measures of performance and behavior is a valuable tool in performance evaluation and assessment. Printouts may be made of situation displays, bird's-eye views, and status displays at different stages of the exercise on different scales. These data can be used during training debriefings for informal evaluations and during licensing assessment as one objective measure of the candidate's performance.
Subjective Evaluation and Assessment
Subjective evaluations and assessments are open to interpretation or bias by any or all involved—the evaluator, assessor, or student. While these methods might take the form of checklists or task lists, the evaluation or assessment also includes the observer's qualitative judgment of the efficacy of the student's performance. In the training environment, subjective evaluation requires that the evaluator be accepted as having superior knowledge of the subject-matter.
The typical ship-bridge simulator course uses debriefing as the subjective evaluation form of choice. The instructor, student peers, and students themselves may comment on portions of the run that were well done or were not so well done and ways in which performance could be enhanced. These informal (written or unwritten) evaluations carry substantial weight with students and instructors. Personal practices can be compared, results measured, and alternatives explored in conversation and on the simulator.
Use of subjective evaluations can be effective and, in the absence of scientific performance-based criteria, is currently the primary means for ascertaining whether an individual can effectively apply knowledge in conducting actual operations. For example, a subjective evaluation may be used in determining whether an individual can collect, correlate, and interpret considerable information from multiple sources, make appropriate decisions based on this information and his or her nautical knowledge, and perform multiple tasks in the correct time sequence under the routine pressures of actual operating conditions. Indeed, the functions and tasking just described are exactly what is asked of the officer of the watch and, more important, of third mates from the moment they begin to stand their first underway watch.
In the use of subjective assessment for mariner licensing, the qualifications and credibility of the assessor are especially important. As discussed later in this chapter, the licensing authority responsible for the testing, as well as the candidate being tested, must be assured that the assessor has superior knowledge and can be impartial in conducting the assessment. Assessor qualification is an area where the similarities between use of simulation in the marine and commercial air carrier industries are pronounced. In both industries the qualifications and perceived credibility of instructors, evaluators, and assessors are paramount to the individual being evaluated or assessed.
TRAINING AND EVALUATION WITH SIMULATORS
Although there are notable exceptions, evaluation is not systematically applied nor is the methodology for conducting performance evaluation well developed. Most performance evaluation methodologies that have been attempted rely on adherence to prescribed procedures (e.g., operating certain equipment or using correct radio procedures) or subjective evaluations by experts. These evaluations may or may not be based on detailed task analyses and formal evaluation objectives and criteria. In the use of simulation for evaluation, it is important to validate the evaluation procedures used to ensure that they are evaluating applicable competencies (i.e., competencies needed at sea).
Current Evaluation Methods
Traditionally, the primary methodology for evaluation in a marine training program has been observation and feedback from the instructor (or instructors) to individuals and teams. In these cases, the instructor usually has the responsibility for both the input (e.g., lectures, discussions, demonstrations) and the output (evaluation). Often these two distinct roles are held by the same person.
An alternative method, sometimes used in vessel or bridge team and bridge resource management courses has been to assign to other course participants the role of observator and evaluator. This peer evaluation methodology relieves the instructor of the dual role and permits evaluation of instructional efforts based on the strength of the peer evaluation.
There can be two serious drawbacks to peer evaluation. First, peers may be reluctant to evaluate the performance of their peers. They may feel that such feedback is not in the tradition of the industry and that the ideal of unencumbered, individual (master) decision making is being compromised. The peers may also feel that they are insufficiently schooled in the practices under consideration to make useful feedback comments. (Of course, providing feedback to others may be included in the instructional design, since such a practice forces observers to learn from performers).
The second drawback is that the instructor may not allow peer feedback. In
the role of evaluator, the instructor may take the lead, criticize or disagree with peers, not acknowledge alternative perspectives, or in other ways use his or her authority to limit peer performance evaluation.
There are a few instances in which individuals other than instructors have been involved in evaluation and feedback. Some companies have used either senior mariners who are not part of the instructional cadre as observers or evaluators. Other companies have used senior shoreside or nonoperating personnel in this role. In these cases, the observers were usually recognized as having specialized skills, knowledge, or experience that could be appropriately applied to the evaluation process.
Evaluation Criteria and Performance Standards
When using simulators for training, evaluation criteria should be carefully defined. Selected performance standards should reflect at-sea competency requirements. Generally these standards will represent a baseline level of required ability, not a standard of excellence. It may be more difficult to evaluate teamwork than individual performance in an objective, measurable way, especially because vital skills, such as judgment, are not easily evaluated.
Because of the need to ensure that everyone meets or exceeds baseline standards, normative-referenced1 testing and evaluation appear to have limited value in marine professional development. These evaluations are best suited for cadet training or training situations involving several junior third officers in the same class.
Most training evaluation in the maritime industry appears more suited for criterion-referenced2 or domain-referenced3 testing. These methodologies allow the performer to demonstrate both the strengths of individual capabilities and the deficiencies, which may be corrected through practice, coaching, or additional experience. For criterion-referenced evaluation to be fully effective, however, it is important to address evaluation in the context of the total instructional design process. As noted earlier, concerns in applying instructional design include limited usefulness and age of detailed task analyses available within the industry,
absence of performance standards, variability in the observing and evaluating processes, and reluctance among some in the industry to fund and employ the required changes.
Concerns about Evaluation with Simulators
Within a training program, the inclusion of a formal evaluation by an instructor could adversely influence the interpersonal dynamics between the instructor and the students, or among the students in a team-oriented course. The trust and rapport between instructional staff and trainees is vital and can be damaged if the students are overly concerned about passing a test.
There are also several reasons why some operating companies, unions, and others may be reluctant to measure or evaluate individual performance using a simulator. First, the mariner who attends a training course at a simulator facility has already met the USCG requirements for licensing. Since the license is already valid—and by extension, the mariner is assumed to be fully knowledgeable and qualified—simulator-based training can be viewed by the trainee as interesting and of some marginal value, but not required. It is possible that many mariners would forego the opportunity to train on a simulator if they were required to pass a nonlicense-required written or performance test.
Second, a substantial number of U.S. simulator facilities are operated by unions or union-related organizations. Training is often a benefit of union membership. In such cases, it might be considered counterproductive to some training programs to formally identify performance shortcomings or problems, although evaluations are performed at some union-operated facilities at clients' requests.
Third, in contrast to the commercial air carrier and nuclear power industries, the nature of marine operations and differences in vessel configurations and maneuvering behavior results in considerable variability in the strategies used for successful operations. Thus, the approaches, practices, and techniques taught in a simulation course might not necessarily be those used aboard any particular ship. Simulator instructors may therefore be reluctant to say how an individual might perform aboard a given vessel.
Fourth, few, if any, simulators and simulations are currently validated across platforms. This potential inconsistency among platforms and simulations raises concern that the evaluation may not be an accurate reflection of the individual's true abilities and skills (see Chapter 7).
There is also concern that formal training performance evaluation records, including specific descriptions of course conduct and documentation of individual performance, could become evidence in accident investigations or disciplinary proceedings. The concern is that these records could be open to misinterpretation and that individuals might not wish to participate in training if records of
their performance were retained. Given the nature of simulator-based training, and the fact that errors are routinely allowed to occur as a training tool, the concern over possible misinterpretation of training results by individuals not qualified to evaluate simulation performance should be considered when deciding whether formal records should be retained.
Employer Use of Simulation for Training and Evaluation
Hiring practices in the maritime industry, although not standard, are usually based on acceptance of a marine license as proof of basic competence. A newly employed deck officer is normally considered competent for initial employment and may be given significant responsibility the first day on the job based solely on possession of the required license. This immediate expectation of competence may exist even when the officer is serving aboard an unfamiliar vessel, in unfamiliar ports and waters, with an unfamiliar crew.
U.S. shipping and towing companies, however, usually do not promote deck officers to positions of greater responsibility based on a marine license alone. A period of sea service, often longer than that required by the international Standards for Training, Certification, and Watchkeeping (STCW) guidelines or USCG regulations, is normally required so a deck officer can acquire experience and skills needed for promotion and to give the employer and ship's senior officers time to observe and evaluate the deck officer's abilities.
Simulation could potentially be used as a tool for initial evaluation and indoctrination into a particular company's operating practices in routine and emergency situations, prior to actual engagement. This would eliminate the present concern in situations where an officer is "hired blind."
Use of simulation may also enable employers to shorten the period of on-the-job training required for promotion. Simulator training in company procedures and ship-specific operating practices, followed by an objective, performance-based evaluation of the skills acquired, appear to fill a void not previously addressed by traditional teaching and evaluating methods.
Some employers have already initiated standard programs that use simulation for training and evaluation. The Panama Canal Commission program, described in Box 5-1, includes formal pilot training using both onboard and simulator-based evaluation by senior pilots. Boxes 5-2 and 5-3 include comments on use of simulators for testing pilots and a summary of a simulator-based check-ride, respectively.
The Panama Canal pilot development is described in Appendix F. The program is considered unique and presents opportunities for measuring the effectiveness of formal pilot training. Potentially, the program could be studied to provide a resource for evaluating traditional versus simulator-based training.
BOX 5-1 Use of Simulators for Performance Evaluation: The Panama Canal Commission
The Panama Canal Commission has a program for formal pilot evaluation based on a series of periodic shipboard check-rides by senior pilots. The Commission is presently testing the use of simulator-based evaluations to supplement the shipboard check-rides. The test program includes a series of "dual check-rides" wherein a pilot is checked one day aboard ship and tested a second day using the simulator. The tests are conducted by two different pilots, and the results of each check-ride are kept separate to ensure a blind test. The tests' results are being compared to determine whether there is a correlation between shipboard and simulator evaluations. Potentially, the program could be a resource for evaluating traditional versus simulator-based testing for licensing and other purposes.
LICENSING PERFORMANCE ASSESSMENT WITH SIMULATORS
Currently, simulator-based assessment in the United States is used directly in only two licensing assessments. The first is the radar observer certification mandated through ratification of the STCW guidelines, and the second is the recently approved master's level course (see Appendix F). This course was the result of an unsolicited proposal from the simulator operating company and is based on the successful completion of a training program.
The proposal that additional simulator-based programs for marine licensing assessment should be introduced was suggested to the committee during presentations. The suggestion was based on a belief that performance in such exercises would demonstrate not only the knowledge assessed in the written exam but also the application of that knowledge.
The Licensing 2000 and Beyond report (Anderson et al., 1993), discussed earlier, suggests that the use of designated examiners "for practical examination of individuals seeking lower level licenses where a demonstration of ability, in addition to or in lieu of written examinations, holds appeal. Similarly, a designated examiner used to sign off practical factors application of other license or certificates may prove desirable." Among the report's recommendations was "Place significant increased emphasis on approved courses, and other, more formalized methods of training and de-emphasize 'sea time,' un-verifiable for quality or quantity, as the principal guarantor of competency."
The committee, however, disagrees. It believes that before simulation can be effectively applied to the licensing process, there are a number of critical issues in both training performance evaluation and licensing assessment that should be
BOX 5-2 Comments on Testing Pilots Using Simulators
I believe simulators can indeed be used for testing (based on our trial of simulator testing to date), but only for certain tasks, not every piloting task.
We have recreated the same transit conditions on the simulator as in the canal. Our simulator experience showed that:
Captain S. Orlando Allard
Chief, Maritime Training Unit
Panama Canal Commission
addressed. As noted in Licensing 2000,''The use of simulators for testing purposes is controversial.… Further, wide-spread use of simulation for testing of more definitive subjective knowledge has yet to be fully demonstrated."
ISSUES IN SIMULATION EVALUATION OR ASSESSMENT
The committee has identified a number of issues and constraints that were viewed as impairing the broad, near-term application of simulation in the marine licensing assessment. Before a large-scale program is undertaken, these issues should be addressed.
The Need for a Systematic Approach
The recent decision of the USCG to add a simulator-based training evaluation to the master's licensing process was not systematic. Submission of an unsolicited proposal to conduct the course and its acceptance by the USCG suggest the need for a well-defined plan. To be adequately prepared for the quality
BOX 5-3 Typical Summary of a Simulator-Based Check-Ride
The Panama Canal Maritime Training Simulator is a very effective tool to supplement the training of apprentices and limited pilots. A simulator, however, cannot replace the "hands-on" training the pilots in training (PITs) and pilot understudies (PUPs) receive while riding with another pilot on the canal. A simulator is useful for teaching shiphandling skills but cannot accurately reflect the true behavior of a given vessel in a given area of the Panama Canal. Therefore, the simulator should be used primarily as a teaching aid.
With the simulation, PUPS, PITS, pilots, and shiphandlers are able to:
The 10,000 dwt (deadweight ton) ship model does not handle in an identical manner to the standard 10,000 dwt vessels that transit the canal in the following ways:
The artificial atmosphere in the simulator creates greater stress for the following reasons:
Jeffrey B. Robbins
Pilot Training Officer
Panama Canal Commission
control and oversight responsibilities of an expanded, simulator-based licensing program, the USCG must have a framework. Its core must be guided by formal simulator and simulation validation standards, applicable training course standards, and certification of instructors and assessors.
The USCG should also consider including in the framework provisions for follow-up of the currently approved master's course (and others as they are approved) to collect and analyze data, such as comparing success and failure rates in the new course to those of traditional testing methods.
From the information and conclusions in Licensing 2000 and Beyond, summarized in Chapter 2, it appears that the USCG's marine licensing infrastructure does not have the structure or staff to fully apply advanced testing technologies as an element of marine licensing or oversee the possible delegation of additional testing responsibilities to third or fourth parties.4 Adopting a systematic approach would help ensure that the appropriate quality control and oversight infrastructures are in place and that all implementation issues discussed below are addressed.
The structure of the framework could make use of the instructional design concepts outlined in Chapter 3. Elements of instructional design that might be integrated include:
- characterizing populations for which marine licensing is required and specifying competency requirements based on specific task and subtask analyses;
- developing marine licensing goals and objectives;
- developing standard performance criteria to measure whether licensing goals and objectives are met;
- determining the knowledge, skills, and abilities (or ranges for each) required to meet standard performance criteria;
- determining requirements for practical experience needed to develop knowledge, skills,and abilities;
- developing examining and assessing methodologies, including matching assessment media to assessment objectives;
- identifying resource requirements and testing media and validating and correlating them with marine licensing objectives, including a detailed inventory of simulators available by type and an estimate of percent time potentially available for simulator-based licensing assessment;
- matching specific assessment techniques to licensing requirements;
- establishing assessor qualification, selection, training, and certification requirements necessary to ensure the quality of the marine licensing process relative to established objectives;
- reviewing and approving proposed programs and courses for satisfying marine licensing requirements, including course content and materials and qualifications and certification of assessors; and
- establishing a monitoring and program evaluation system for the marine licensing program itself to provide a basis for continuous improvement.
Developing a plan based on the instructional design process requires consideration of the issues outlined below, as well as the following two problems (discussed in Chapter 3), which are basic to the application of instructional design to the marine industry:
- An important element in applying instructional design to marine licensing is specification of competency requirements based on task and subtask analyses. As discussed earlier, although the maritime industry has conducted several task analyses over recent decades, the results of studies have not been widely accepted as accurate depictions of the skills, knowledge, or abilities of various license levels (third mate, second mate, chief mate, or master). The studies are now dated and do not specifically address behavioral aspects of individual job performance or the specific steps required. Many are general and unfocused regarding specific requirements of a particular fleet or class of vessel.
- The issue of range "ranges of acceptable professional performance" must be addressed. In contrast to related industries (aviation, nuclear power, etc.), there are no absolute performance specifications in the maritime industry. For example, the individual, professional judgment of a watch officer, pilot, or master determines when a rudder order will be initiated and at what magnitude. Another equally competent officer might choose a different course of action, and there are no standards for comparing the two as long as the results are equivalent and the vessel does not subsequently become involved in an incident or accident. Properly applying the instructional design process will require developing a methodology to address this problem.
The Development of Standards
The Need for Standard Evaluation and Assessment Scenarios
Among some people who met with the committee who are responsible for conducting simulator-based training programs there was a suggestion that the professional judgment or the subjective evaluation of a trained evaluator can probably be as effective in evaluating performance against stated criteria as are objective (criterion-or domain-referenced) measures. Their experience is that nearly all simulators, including manned models, uncover and display individual and team deficiencies regarding most aspects of navigation, watchkeeping, shiphandling,
communications, and coordination. Failure to prepare appropriate plans, anticipate system failures, maintain situational awareness, and manage stress (i.e., many of the problems identified as contributing to marine casualties) will almost invariably show up in normal, everyday watchkeeping and shiphandling situations.
At present there are no standard simulator scenarios available for training performance evaluation or licensing assessment. Individual training establishments have developed their own scenarios, often in conjunction with individual shipping companies. For the Master's Level Proficiency Course (Appendix F), the STAR Center staff developed 10 generic scenarios for each of 4 exercises used in the simulator portion of the program. These scenarios, as well as the models on which they are based, were developed entirely in-house to standards determined at the STAR Center in consultation with the USCG.
For effective mariner performance evaluation or assessment on a simulator, industrywide standard evaluation and assessment scenarios are needed. They should be based on cross-platform and cross-student research. All parties of interest should be included in developing the standards for these scenarios (see Chapter 7 for a discussion of a mechanism for developing standards). They could be based on hypothetical (generic) information, real-world information, or a hybrid of both. Each approach has advantages and disadvantages with respect to use in training or marine licensing (see discussion of simulator types in Chapter 4).
The Need for Consistent Results from Simulators
In conducting any form of performance training evaluation or licensing assessment, it is important to distinguish between individual performance under real conditions and variations that are induced by the training environment or testing situation. The basis for making such a comparison is, for practical purposes, limited to expert opinion at present and in the foreseeable future.
In structuring any simulator-based licensing assessment program, it is very important to carefully define what levels of simulator validity are required or acceptable for different levels of licenses. Not all licenses require the high face or apparent validity possible with a full-mission ship -bridge simulator.
Before approving a simulator for formal training performance evaluations or licensing assessments, it is important, as part of the validation process, to have credible experts subjectively assess whether or to what degree a simulation consistently results in behavior that would be expected under identical or similar real-world conditions. Training-induced variations in individual behavior and performance would not necessarily disqualify the simulation as an evaluation media for a training program, but should be considered in applying evaluation methodologies. If a simulation is used for licensing assessment, it is crucial that variations in performance are recognized and accounted for during assessment. (Chapter 7 includes a more detailed discussion of simulator validity and standards-setting).
Certification of the Evaluator or Assessor
If demands for formal simulator-based evaluation and assessment increase, many currently informal practices would need to be modified to a more systematic approach to ensure adequacy and consistency. Consideration should be given to formally separating the instructor and assessor roles, at least with respect to marine licensing practices. Such separation would help ensure the integrity of any training provided to meet licensing requirements.
For licensing assessment to be most effective, it must be conducted by impartial assessors teams who function separately from the simulator operation and who conduct the simulation exercises independently. (This mode of assessment is the opposite of what the USCG has done in the STAR Center's master's course.) Currently there is considerable competition among license-preparation schools regarding their ability to prepare mariners to pass license examinations. If licensing assessments were accomplished solely by individuals affiliated with the simulator facility, considerable care would be needed to avoid conflicts of interest. Separation of these functions would avoid the possibility of influencing the candidate's performance aboard the simulator.
The education, qualifications, and experience requirements for a simulator-based license assessor may be very different from those of a training program performance evaluator. Careful consideration should be given to defining the assessor's professional skills and competencies and how they can be measured, as well as judging his or her observational and assessment skills and competencies.
The assessor who is subjectively assessing the behavior and performance of a candidate in a license-granting situation must be capable of isolating, observing, and measuring that performance effectively and impartially. It may be necessary to develop specialized training or perhaps apprenticeships for assessors to ensure that they possess the skills required to make effective, impartial assessments.
In developing the licensing assessment framework, the USCG needs to ensure that they have, or can develop, the capability to qualify assessors. Currently, there is no formal certification of license assessors outside the USCG. The agency does have a system for instructor certification. As a part of the course-approval process, facilities are required to list instructors authorized to teach an approved course. If training (with or without formal evaluation of performance) were to be required as part of marine licensing, it is likely that company and union management would become more involved in observation and assessments.
Development and certification of qualified, impartial assessors will require time. It is important that the USCG consider a phasing-in process to allow time for this process to take place. Development of an adequate number of qualified assessors should be factored into the agency's licensing framework and phased approach to the introduction of simulator-based licensing.
Separating the Simulator and Student Performance
Related to the issues of simulation and simulator validity and to assessor qualifications is the concern of separation of simulator performance from student performance. In both training performance evaluation and marine licensing, it may be difficult to separate the performance of the simulator from that of the student or candidate. Full-mission simulators have a range of presentation capabilities (accuracy and fidelity), performance response (hydrodynamics and aerodynamics), physical layout (bridge hardware integration and design), and performance measurement (embedded). The older the system, the more likely that its embedded recording and reporting capabilities will be limited. The more recent installations offer wider choices of scale, frequency of plot, richness of detail, color graphics, and other features.
There have been no cross-platform studies as to the efficacy of specific platforms for training evaluation or licensing performance assessment. Furthermore, different platforms are limited to the ranges of performance they can simulate. These limitations include, for example, number of controllable other vessels, degree of fog or reduce visibility, hydrodynamic forces (e.g., channel and bank, squat, passing ship), and size of vessel team that may be accommodated.
In training environment, a specific student may learn faster or more completely with one particular simulator-instructor combination than another. It may be that all simulators can meet the specific demands for performance evaluation and assessment if the criterion or domain to be measured is broad enough to factor platform constraints into the process. Some simulators may be more appropriate for certain types of training or assessment. Conversely, no simulator is best at all of these requirements.
Currently, student or license candidate performance is, at least in part, a function of simulator capability. Until cross-platform and cross-student (or candidate) studies have been performed, all simulator-based-training performance evaluation and licensing assessment must be recognized as being based on a particular simulator or simulator-evaluator/assessor combination.
Availability of Simulator Resources
The decision to require a specific level of validity for a specific level of license and to require use of standard simulation scenarios for specific license levels will have a major impact on commercially operated simulator facilities. The demand for testing could quickly exceed ship-bridge and other simulator resources. In developing a licensing framework that includes simulator-based training and assessment, the USCG should consider a phasing-in process to allow sufficient time for the marketplace to provide the resources needed to meet the potential demand.
Evaluations conducted in conjunction with training are a cost of doing business that can be accommodated in course structure. Evaluations and assessments for other purposes, if not conducted during training, could result in added costs. The required funding would depend on the testing platform and the manner in which the assessment or evaluation was conducted.
In Licensing 2000 and Beyond (Anderson et al., 1993), it is suggested that "in some areas it should be possible to shift expenditures [i.e., costs to the license or document applicant or his or her employer already incurred directly or indirectly in obtaining a license or document] from existing indirect costs into more constructively applied direct costs." This report suggests, for example, that these costs might be shifted "from present-day courses... to the cost of formal competency-directed training in an approved course.... Successful completion of the course could eliminate the U.S. Coast Guard exam."
Any USCG-mandated training or licensing assessment needs to include an analysis of the possible sources of funding to ensure that mariners have the ability to pay for the training, especially if the mandated training would effect license renewal and continued employment.
The recently approved master's course combines training and testing in a two-week period, and successful candidates receive their unlimited master's license. For this program, the American Maritime Officers Union has stated that all members seeking their master's license will be required to take this course, and the member's company will pay for the program.
In addition to the technical issues discussed above, the following issues should to be addressed in the development of a framework for the USCG licensing program.
Implications of Combining Training with Formal Licensing Assessment
In addition to the concerns discussed above about combining the instructor/evaluator function with that of the assessor in marine licensing, consideration must be given to the development of adequate security measures at the operating and authorizing levels to ensure fairness, accuracy, and reliability.
Familiarization with the Simulator
No matter how high the fidelity and accuracy of a ship-bridge simulator, it is not a real ship. The individual whose competency is being evaluated or assessed needs to have some level of familiarity with the testing platform—whether it is a
ship, ship-bridge simulator, manned model, or other form of simulator or testing device—so that the influence (positive or negative) of that platform on his or her performance is minimized. The degree and level of familiarization that should be conducted depends on whether the familiarization is used as training in advance of a performance evaluation or is in advance of an assessment required for marine licensing.
Familiarizing an individual with the simulator in conjunction with training is appropriate insofar as that which is being measured is performance outcome resulting from training. Using training to familiarize an individual with a simulator where the goal is competency determination for licensing is more problematic. There is a need for research to quantitatively determine whether extensive familiarization with a simulator artificially inflates individual performance. The effect of combining simulator training and simulator-based assessment within a two-week period is an aspect of the recently approved USCG Master's Level Proficiency Course that should be carefully reviewed.
An alternative to familiarization with a ship-bridge simulator through training is use of special indoctrination simulations designed for this purpose. This approach, however, would add cost without providing any additional return on the investment of time and resources, as might be achieved by using training as a vehicle for familiarization.
Role-Playing in Licensing Assessment
Conducting an effective performance evaluation or licensing assessment requires that all normal bridge positions are filled and are "played" in the same manner in which they would be aboard ship. Who should play these roles for evaluation and assessment is problematic and not easily resolved.
If colleagues play bridge team and pilot roles, especially if the colleagues are also to be tested, their performance in support of the individual being examined could mask weaknesses. Furthermore, colleagues may not represent the actual level of expertise that would be associated with each bridge team position in real life. Thus, the performance of colleagues is unlikely to exactly replicate what would occur at sea. (This consideration affects all role-players.) Personality conflicts might have the opposite effect by increasing the level of difficulty or interfering with the effective performance of the individual being examined.
Use of a simulator facility's employees for role playing is an alternative, but could also present problems. On one hand, consistency in evaluation or assessment could be created by using the same individuals as role-players, provided that their performance did not become so good as to implicitly enhance the candidate's performance. In competency determinations for licensing, however, the use of simulator facility employees, although perhaps economical, might result in the appearance of conflict of interest, because of the facility's interest in continued use of its resources for testing. In the USCG-approved master's course
at the STAR Center, simulator facility employees were used as role-players while the candidate was being tested (See Appendix F).
Roles could be played by disinterested parties, in which case costs would likely increase, and there would be issues of quality control over the expertise of the role players. Another alternative would be for roles to be played by representatives of marine licensing authorities. Here again, there would be an overall increase in costs. The qualifications of the role-players might also be of concern.
Summary of Findings
Understanding mariner training, competence, and proficiency, and the factors required in measuring human performance, either for real-word or simulator-based learning situations, is important for developing evaluation and assessment techniques. For the purposes of this report, the term evaluation is defined as being the output of a training program. Evaluations may be formal or informal, objective or subjective, or both. Assessment is defined as the output of a licensing process wherein the performance of the candidate is formally measured against a defined set of standard criteria. Assessments may also be objective or subjective. Evaluators work in training or performance evaluation programs, and assessors work in license-granting programs.
During the course of its work, the committee found that much of the evaluation done in connection with simulator-based training is informal. There are several reasons for this, including a reluctance to maintain formal training performance records because of the nature of simulator-based training, which can include the deliberate introduction of errors. A second reason is that many trainees already hold licenses and might be reluctant to attend training if they believed they were to be tested.
To date, simulation is used directly in licensing in only two instances radar observe certification and in the recent USCG-approved Master's Level Proficiency Course. The USCG's acceptance of this latter course was in response to an unsolicited proposal. Before the agency undertakes more extensive use of simulation in marine licensing, program framework that includes consideration of the following issues should be developed:
- validation of the scenarios used in performance evaluation and licensing assessment;
- determination of the appropriate level of simulation for each license level;
- qualification and certification of the evaluators and assessors;
- availability of simulators; and
- source of funding for the cost of simulator use, especially in licensing assessment.
Two of these issues, evaluator and assessor qualifications and availability of simulators, could involve significant time. In the development of its licensing program framework, the USCG will need to include time to develop (or oversee development of) training courses and to train (or oversee training of) evaluators and assessors. The USCG will also need to factor in sufficient time to not only allow new simulator facilities to come on-line, but also to ensure that the simulators and simulations are properly validated.
One approach to developing the framework needed for the effective, widespread introduction of simulator-based licensing assessment would be to apply elements of the instructional design process discussed in the above section ''The Need for a Systematic Approach." Systematic application of this process would address many of the issues raised by the committee.
In the course of its investigation of the uses of simulation for training performance evaluation and licensing assessment, the committee identified a number of areas where existing research and analysis did not provide sufficient information to extend its analysis.
The "Research Needs" section of Chapter 3 discusses the need to expand and update specific task and subtask analyses, including data on dimensions related to behavioral elements and specific steps needed to execute each subtask. In addition to using these data in developing training courses, they are important in developing the licensing assessment program. In use of simulators as part of the licensing program, the skill and abilities to be measured must be defined.
There are also few data regarding specific requirements of a particular fleet or class of vessel. Such information would be useful in applying instructional design to the simulator-based licensing assessment process.
Another concern of the committee is that in the marine industry there is little or no research that would assist in developing performance-based criteria to be used as measurements for determining whether simulator-based licensing objective are being met.
Conducting licensing assessment using simulators at several different facilities would necessitate the development of industrywide standards for the assessment scenarios used. These standard scenarios should be based on cross-platform and cross-candidate research to ensure consistent, reproducible measures of performance.
There is also a need for cross-platform studies to determine the efficacy of specific for training evaluation or licensing performance assessment. Individual platforms may be limited in the range of performance they can simulate, and they may be more applicable to certain levels of licenses. These limitations could include such factors as number of other controllable vessels, degree of fog or reduced visibility, hydrodynamic forces (e.g., channel and bank, squat,
passing ship), and size of vessel team that may be accommodated. To judge which platform to use for which licensing assessment, it is necessary to understand the advantages and limitations of each type of platform, as well as the necessary levels of validity and fidelity for the different levels of license.
The committee could not find any research that would support or discredit the combining of simulator training and simulator-based assessment in a single course. The committee does, however, have some concerns about the potential problems (listed briefly above). These potential problems should be addressed, and research should be conducted to determine whether there are reasons to either combine on separate the two functions.
Anderson,D.B., T.L. Rice, R.G. Ross, J.D. Pendergraft, C.D. Kakuska, D.F. Meyers, S.J. Szczepaniak, and P.A. Stutman. 1993. Licensing 2000 and Beyond. Washington, D.C.: Office of Marine Safety, Security, and Environmental Protection, U.S. Coast Guard.
NCR(National Research Council). 1985. Human Factors Aspects of Simulation. E.R. Jones, R.T. Hennessy, and S. Deutsch, eds. Working Group on Simulation, Committee on Human Factors, Commission on Behavioral and Social Sciences and Education. Washington, D.C.: National Academy Press.