Comparison of Automatic and Human Evaluation of L2 Texts in Czech

한국슬라브어학회
슬라브어 연구
제24권 제1호
2019.04

93 - 101 (9 pages)
DOI : 10.30530/JSL.2019.04.24.1.93

In this paper, we introduce an experimental probe comparing how texts written by non-native speakers of Czech are evaluated by a software application (computer program) EVALD and by teachers of Czech as a foreign language. The hypothesis for the probe was that teachers, even if they are given structured instruction for evaluation and go through the standardization process, are not able to reach satisfactory results and to agree on the same evaluation, which depreciates the whole text evaluation process. This is a problem especially for objective assessment during certificated exams such as the Exam for Permanent Residence. A group of 44 teachers of Czech as a foreign language who underwent special training and the computer program evaluated 2 texts from the point of view of relevant features of the A1–C1 levels established by the Common European Framework of Reference for Languages. The task included evaluation of the overall level of the texts and evaluation of specific aspects of the texts: punctuation, morphology, lexis, syntax and coherence. We compare the evaluation of the text among the teachers and the teachers with the computer program. In the general evaluation, only 41% of persons agreed on the same level for text A and 50% for text B (we describe and interpret the agreement in the evaluation of orthography, morphology, syntax, lexis and coherence below). The lowest rate of interevaluator agreement was in orthography – 38% for text A and 40% for text B. In morphology, 59% of persons agreed on the same level for text A and 61% for text B. We further compared human and automatic agreement on the evaluation. 41% of teachers agreed with the program on the evaluation of text A and 50% of text B. Again, we also compared the results on the particular language and text levels. Our results clearly show that human evaluation is rather inconsistent and it would be advisable to use automatic evaluation in cases where consistency and high agreement is desired.

1. Introduction

2. Description of the Experiment

3. Results and Evaluation

4. Conclusion

References

Comparison of Automatic and Human Evaluation of L2 Texts in Czech

(0)

(0)

(0)

(0)

Comparison of Automatic and Human Evaluation of L2 Texts in Czech

(0)

(0) 팝업 열기 팝업 닫기

(0)

(0)

(0)