The Reliability of Instructor Evaluations of Crew Performance: Good News and Not So Good News

Document Type


Publication Date


Digital Object Identifier (DOI)



Two instructors rated the crew performance of helicopter crews (N = 45) who flew a simulated mission in a full-motion simulator. The instructors were given dimension training and behavior-observation training before the ratings. Before making their ratings, instructors completed an observation form as they watched videotapes of the crews and then completed a form that helped link behaviors to dimensions. For their ratings, instructors recorded or evaluated crew behaviors using 3 types of items: (a) specific crew behaviors in response to scenario events (e.g., whether the crew kept out of icing conditions), (b) evaluations of crew responses to scenario events (e.g., overall handling of the icing problem), and (c) crew resource management (CRM) dimensions for the entire scenario (e.g., evaluations of decision making). Results showed (a) both interjudge agreement and internal consistency were high for evaluations of crew responses to scenario events, (b) interjudge agreement was low but internal consistency was high on CRM items and scales, and (c) interjudge agreement was high but internal consistency was low for specific observable behaviors. The results for the evaluations of crew responses to scenario events were very encouraging and showed reliability over time and over crews. Suggestions for improving the reliability for the other 2 item types are provided.

Was this content written or created while at USF?


Citation / Publisher Attribution

International Journal of Aviation Psychology, v. 12, issue 3, p. 241-261