Katrin Lintorf, Nele McElvany, Camilla Rjosk, Sascha Schroeder, Jürgen Baumert, Wolfgang Schnotz, Holger Horz and Mark Ullrich

Zuverlässigkeit von diagnostischen Lehrerurteilen - Reliabilität verschiedener Urteilsmaße bei der Einschätzung von Aufgabenschwierigkeiten


While the accuracy of teacher judgements is well investigated, there has been little research about the reliability of the hereby applied accuracy components. The present study investigates the parallel-test reliability and the internal consistency of accuracy components for the assessment of task difficulty. The database for this study stems from the project BiTe (``Development and evaluation of competence models for integrative processing of texts and pictures'', University of Koblenz-Landau/Max Planck Institute for Human Development, Berlin/Technical University of Dortmund). It consists of a sample of N = 1031 students and a sample of N = 96 biology, geography and language teachers. With regard to the analyses, the accuracy components suggested by Schrader and Helmke (1987) were computed for two examples of tasks each consisting of six items. The results reveal for all three components, that the teachers' judgements were not comparably accurate across the two examples. Additionally, confirmatory factor analyses were performed across the difference values which make up the level component. The analyses provide no evidence for the one-dimensionality of accurate judgements. Therefore, an interpretation of the internal consistency was not possible. Implications of a potential unreliability of accuracy components applied to teacher judgements are discussed and desiderata for further research are formulated.